qrwlock.c - OpenGrok history log for /linux-6.15/kernel/locking/qrwlock.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1
# 501f7f69	10-Aug-2022	Namhyung Kim <[email protected]>	locking: Add __lockfunc to slow path functions So that we can skip the functions in the perf lock contention and other places like /proc/PID/wchan. Signed-off-by: Namhyung Kim <[email protected]> locking: Add __lockfunc to slow path functions So that we can skip the functions in the perf lock contention and other places like /proc/PID/wchan. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://lore.kernel.org/r/[email protected] show more ...
Revision tags: v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7
# 434e09e7	10-May-2022	Waiman Long <[email protected]>	locking/qrwlock: Change "queue rwlock" to "queued rwlock" Queued rwlock was originally named "queue rwlock" which wasn't quite grammatically correct. However there are still some "queue rwlock" refe locking/qrwlock: Change "queue rwlock" to "queued rwlock" Queued rwlock was originally named "queue rwlock" which wasn't quite grammatically correct. However there are still some "queue rwlock" references in the code. Change those to "queued rwlock" for consistency. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected] show more ...
Revision tags: v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1
# ee042be1	22-Mar-2022	Namhyung Kim <[email protected]>	locking: Apply contention tracepoints in the slow path Adding the lock contention tracepoints in various lock function slow paths. Note that each arch can define spinlock differently, I only added locking: Apply contention tracepoints in the slow path Adding the lock contention tracepoints in various lock function slow paths. Note that each arch can define spinlock differently, I only added it only to the generic qspinlock for now. Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Link: https://lkml.kernel.org/r/[email protected] show more ...
Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1
# 28ce0e70	26-Apr-2021	Waiman Long <[email protected]>	locking/qrwlock: Cleanup queued_write_lock_slowpath() Make the code more readable by replacing the atomic_cmpxchg_acquire() by an equivalent atomic_try_cmpxchg_acquire() and change atomic_add() to a locking/qrwlock: Cleanup queued_write_lock_slowpath() Make the code more readable by replacing the atomic_cmpxchg_acquire() by an equivalent atomic_try_cmpxchg_acquire() and change atomic_add() to atomic_or(). For architectures that use qrwlock, I do not find one that has an atomic_add() defined but not an atomic_or(). I guess it should be fine by changing atomic_add() to atomic_or(). Note that the previous use of atomic_add() isn't wrong as only one writer that is the wait_lock owner can set the waiting flag and the flag will be cleared later on when acquiring the write lock. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lkml.kernel.org/r/[email protected] show more ...
Revision tags: v5.12, v5.12-rc8
# 84a24bf8	15-Apr-2021	Ali Saidi <[email protected]>	locking/qrwlock: Fix ordering in queued_write_lock_slowpath() While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops che locking/qrwlock: Fix ordering in queued_write_lock_slowpath() While this code is executed with the wait_lock held, a reader can acquire the lock without holding wait_lock. The writer side loops checking the value with the atomic_cond_read_acquire(), but only truly acquires the lock when the compare-and-exchange is completed successfully which isn’t ordered. This exposes the window between the acquire and the cmpxchg to an A-B-A problem which allows reads following the lock acquisition to observe values speculatively before the write lock is truly acquired. We've seen a problem in epoll where the reader does a xchg while holding the read lock, but the writer can see a value change out from under it. Writer \| Reader -------------------------------------------------------------------------------- ep_scan_ready_list() \| \|- write_lock_irq() \| \|- queued_write_lock_slowpath() \| \|- atomic_cond_read_acquire() \| \| read_lock_irqsave(&ep->lock, flags); --> (observes value before unlock) \| chain_epi_lockless() \| \| epi->next = xchg(&ep->ovflist, epi); \| \| read_unlock_irqrestore(&ep->lock, flags); \| \| \| atomic_cmpxchg_relaxed() \| \|-- READ_ONCE(ep->ovflist); \| A core can order the read of the ovflist ahead of the atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire semantics addresses this issue at which point the atomic_cond_read can be switched to use relaxed semantics. Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock") Signed-off-by: Ali Saidi <[email protected]> [peterz: use try_cmpxchg()] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Steve Capper <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Waiman Long <[email protected]> Tested-by: Steve Capper <[email protected]> show more ...
Revision tags: v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11
# d8d0da4e	10-Feb-2021	Waiman Long <[email protected]>	locking/arch: Move qrwlock.h include after qspinlock.h include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via asm-generic/qspinlock.h. However, this does not work because architect locking/arch: Move qrwlock.h include after qspinlock.h include/asm-generic/qrwlock.h was trying to get arch_spin_is_locked via asm-generic/qspinlock.h. However, this does not work because architectures might be using queued rwlocks but not queued spinlocks (csky), or because they might be defining their own queued_* macros before including asm/qspinlock.h. To fix this, ensure that asm/spinlock.h always includes qrwlock.h after defining arch_spin_is_locked (either directly for csky, or via asm/qspinlock.h for other architectures). The only inclusion elsewhere is in kernel/locking/qrwlock.c. That one is really unnecessary because the file is only compiled in SMP configurations (config QUEUED_RWLOCKS depends on SMP) and in that case linux/spinlock.h already includes asm/qrwlock.h if needed, via asm/spinlock.h. Reported-by: Guenter Roeck <[email protected]> Signed-off-by: Waiman Long <[email protected]> Fixes: 26128cb6c7e6 ("locking/rwlocks: Add contention detection for rwlocks") Tested-by: Guenter Roeck <[email protected]> Reviewed-by: Ben Gardon <[email protected]> [Add arch/sparc and kernel/locking parts per discussion with Waiman. - Paolo] Signed-off-by: Paolo Bonzini <[email protected]> show more ...
Revision tags: v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3
# c942fddf	27-May-2019	Thomas Gleixner <[email protected]>	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of th treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 Based on 3 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 1105 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> show more ...
Revision tags: v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5
# d1331661	12-Oct-2017	Will Deacon <[email protected]>	locking/qrwlock: Prevent slowpath writers getting held up by fastpath When a prospective writer takes the qrwlock locking slowpath due to the lock being held, it attempts to cmpxchg the wmode field locking/qrwlock: Prevent slowpath writers getting held up by fastpath When a prospective writer takes the qrwlock locking slowpath due to the lock being held, it attempts to cmpxchg the wmode field from 0 to _QW_WAITING so that concurrent lockers also take the slowpath and queue on the spinlock accordingly, allowing the lockers to drain. Unfortunately, this isn't fair, because a fastpath writer that comes in after the lock is made available but before the _QW_WAITING flag is set can effectively jump the queue. If there is a steady stream of prospective writers, then the waiter will be held off indefinitely. This patch restores fairness by separating _QW_WAITING and _QW_LOCKED into two distinct fields: _QW_LOCKED continues to occupy the bottom byte of the lockword so that it can be cleared unconditionally when unlocking, but _QW_WAITING now occupies what used to be the bottom bit of the reader count. This then forces the slow-path for concurrent lockers. Tested-by: Waiman Long <[email protected]> Tested-by: Jeremy Linton <[email protected]> Tested-by: Adam Wallis <[email protected]> Tested-by: Jan Glauber <[email protected]> Signed-off-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Boqun Feng <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
# b519b56e	12-Oct-2017	Will Deacon <[email protected]>	locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock The qrwlock slowpaths involve spinning when either a prospective reader is waiting for a concurrent writer to drain, or a pro locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock The qrwlock slowpaths involve spinning when either a prospective reader is waiting for a concurrent writer to drain, or a prospective writer is waiting for concurrent readers to drain. In both of these situations, atomic_cond_read_acquire() can be used to avoid busy-waiting and make use of any backoff functionality provided by the architecture. This patch replaces the open-code loops and rspin_until_writer_unlock() implementation with atomic_cond_read_acquire(). The write mode transition zero to _QW_WAITING is left alone, since (a) this doesn't need acquire semantics and (b) should be fast. Tested-by: Waiman Long <[email protected]> Tested-by: Jeremy Linton <[email protected]> Tested-by: Adam Wallis <[email protected]> Tested-by: Jan Glauber <[email protected]> Signed-off-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Boqun Feng <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
# e0d02285	12-Oct-2017	Will Deacon <[email protected]>	locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock' There's no good reason to keep the internal structure of struct qrwlock hidden from qrwlock.h, particularly as it's actually neede locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock' There's no good reason to keep the internal structure of struct qrwlock hidden from qrwlock.h, particularly as it's actually needed for unlock and ends up being abstracted independently behind the __qrwlock_write_byte() function. Stop pretending we can hide this stuff, and move the __qrwlock definition into qrwlock, removing the __qrwlock_write_byte() nastiness and using the same struct definition everywhere instead. Signed-off-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Boqun Feng <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.14-rc4, v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12, v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3
# 9ab6055f	24-May-2017	Babu Moger <[email protected]>	kernel/locking: Fix compile error with qrwlock.c Saw these compile errors on SPARC when queued rwlock feature is enabled. CC kernel/locking/qrwlock.o kernel/locking/qrwlock.c: In function ‘qu kernel/locking: Fix compile error with qrwlock.c Saw these compile errors on SPARC when queued rwlock feature is enabled. CC kernel/locking/qrwlock.o kernel/locking/qrwlock.c: In function ‘queued_read_lock_slowpath’: kernel/locking/qrwlock.c:89: error: implicit declaration of function ‘arch_spin_lock’ kernel/locking/qrwlock.c:102: error: implicit declaration of function ‘arch_spin_unlock’ make[4]: *** [kernel/locking/qrwlock.o] Error 1 Include spinlock.h in qrwlock.c to fix it. Signed-off-by: Babu Moger <[email protected]> Reviewed-by: Håkon Bugge <[email protected]> Reviewed-by: Jane Chu <[email protected]> Reviewed-by: Shannon Nelson <[email protected]> Reviewed-by: Vijay Kumar <[email protected]> Signed-off-by: David S. Miller <[email protected]> show more ...
Revision tags: v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3
# f2f09a4c	25-Oct-2016	Christian Borntraeger <[email protected]>	locking/core: Remove cpu_relax_lowlatency() users With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them wit locking/core: Remove cpu_relax_lowlatency() users With the s390 special case of a yielding cpu_relax() implementation gone, we can now remove all users of cpu_relax_lowlatency() and replace them with cpu_relax(). Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Noam Camus <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2, v4.7-rc1, v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4
# f9852b74	17-Apr-2016	Peter Zijlstra <[email protected]>	locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() The only reason for the current code is to make GCC emit only the "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire() The only reason for the current code is to make GCC emit only the "LOCK XADD" instruction on x86 (and not do a pointless extra ADD on the result), do so nicer. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5, v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1, v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2
# 6e1e5196	14-Sep-2015	Davidlohr Bueso <[email protected]>	locking/qrwlock: Rename ->lock to ->wait_lock ... trivial, but reads a little nicer when we name our actual primitive 'lock'. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zi locking/qrwlock: Rename ->lock to ->wait_lock ... trivial, but reads a little nicer when we name our actual primitive 'lock'. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.3-rc1, v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6
# 77e430e3	06-Aug-2015	Will Deacon <[email protected]>	locking/qrwlock: Make use of _{acquire\|release\|relaxed}() atomics The qrwlock implementation is slightly heavy in its use of memory barriers, mainly through the use of _cmpxchg() and _return() atomi locking/qrwlock: Make use of _{acquire\|release\|relaxed}() atomics The qrwlock implementation is slightly heavy in its use of memory barriers, mainly through the use of _cmpxchg() and _return() atomics, which imply full barrier semantics. This patch modifies the qrwlock code to use the more relaxed atomic routines so that we can reduce the unnecessary barrier overhead on weakly-ordered architectures. Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2
# ffffeaf3	09-Jul-2015	Waiman Long <[email protected]>	locking/qrwlock: Reduce reader/writer to reader lock transfer latency Currently, a reader will check first to make sure that the writer mode byte is cleared before incrementing the reader count. Tha locking/qrwlock: Reduce reader/writer to reader lock transfer latency Currently, a reader will check first to make sure that the writer mode byte is cleared before incrementing the reader count. That waiting is not really necessary. It increases the latency in the reader/writer to reader transition and reduces readers performance. This patch eliminates that waiting. It also has the side effect of reducing the chance of writer lock stealing and improving the fairness of the lock. Using a locking microbenchmark, a 10-threads 5M locking loop of mostly readers (RW ratio = 10,000:1) has the following performance numbers in a Haswell-EX box: Kernel Locking Rate (Kops/s) ------ --------------------- 4.1.1 15,063,081 4.1.1+patch 17,241,552 (+14.4%) Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.2-rc1, v4.1
# 0e06e5be	19-Jun-2015	Waiman Long <[email protected]>	locking/qrwlock: Better optimization for interrupt context readers The qrwlock is fair in the process context, but becoming unfair when in the interrupt context to support use cases like the tasklis locking/qrwlock: Better optimization for interrupt context readers The qrwlock is fair in the process context, but becoming unfair when in the interrupt context to support use cases like the tasklist_lock. The current code isn't that well-documented on what happens when in the interrupt context. The rspin_until_writer_unlock() will only spin if the writer has gotten the lock. If the writer is still in the waiting state, the increment in the reader count will cause the writer to remain in the waiting state and the new interrupt context reader will get the lock and return immediately. The current code, however, does an additional read of the lock value which is not necessary as the information has already been there in the fast path. This may sometime cause an additional cacheline transfer when the lock is highly contended. This patch passes the lock value information gotten in the fast path to the slow path to eliminate the additional read. It also documents the action for the interrupt context readers more clearly. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Will Deacon <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
# f7d71f20	19-Jun-2015	Waiman Long <[email protected]>	locking/qrwlock: Rename functions to queued_() To sync up with the naming convention used in qspinlock, all the qrwlock functions were renamed to started with "queued" instead of "queue". Signed-o locking/qrwlock: Rename functions to queued_() To sync up with the naming convention used in qspinlock, all the qrwlock functions were renamed to started with "queued" instead of "queue". Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.1-rc8
# 405963b6	09-Jun-2015	Waiman Long <[email protected]>	locking/qrwlock: Don't contend with readers when setting _QW_WAITING The current cmpxchg() loop in setting the _QW_WAITING flag for writers in queue_write_lock_slowpath() will contend with incoming locking/qrwlock: Don't contend with readers when setting _QW_WAITING The current cmpxchg() loop in setting the _QW_WAITING flag for writers in queue_write_lock_slowpath() will contend with incoming readers causing possibly extra cmpxchg() operations that are wasteful. This patch changes the code to do a byte cmpxchg() to eliminate contention with new readers. A multithreaded microbenchmark running 5M read_lock/write_lock loop on a 8-socket 80-core Westmere-EX machine running 4.0 based kernel with the qspinlock patch have the following execution times (in ms) with and without the patch: With R:W ratio = 5:1 Threads w/o patch with patch % change ------- --------- ---------- -------- 2 990 895 -9.6% 3 2136 1912 -10.5% 4 3166 2830 -10.6% 5 3953 3629 -8.2% 6 4628 4405 -4.8% 7 5344 5197 -2.8% 8 6065 6004 -1.0% 9 6826 6811 -0.2% 10 7599 7599 0.0% 15 9757 9766 +0.1% 20 13767 13817 +0.4% With small number of contending threads, this patch can improve locking performance by up to 10%. With more contending threads, however, the gain diminishes. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4
# c7114b4e	11-May-2015	Waiman Long <[email protected]>	locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS To be consistent with the queued spinlocks which use CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued rwlocks is now renamed to locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS To be consistent with the queued spinlocks which use CONFIG_QUEUED_SPINLOCKS config parameter, the one for the queued rwlocks is now renamed to CONFIG_QUEUED_RWLOCKS. Signed-off-by: Waiman Long <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v4.1-rc3, v4.1-rc2, v4.1-rc1, v4.0, v4.0-rc7, v4.0-rc6, v4.0-rc5, v4.0-rc4, v4.0-rc3, v4.0-rc2, v4.0-rc1, v3.19, v3.19-rc7, v3.19-rc6, v3.19-rc5, v3.19-rc4, v3.19-rc3, v3.19-rc2, v3.19-rc1, v3.18, v3.18-rc7, v3.18-rc6, v3.18-rc5, v3.18-rc4, v3.18-rc3, v3.18-rc2, v3.18-rc1, v3.17, v3.17-rc7, v3.17-rc6, v3.17-rc5, v3.17-rc4, v3.17-rc3, v3.17-rc2, v3.17-rc1, v3.16, v3.16-rc7, v3.16-rc6, v3.16-rc5, v3.16-rc4
# 3a6bfbc9	29-Jun-2014	Davidlohr Bueso <[email protected]>	arch, locking: Ciao arch_mutex_cpu_relax() The arch_mutex_cpu_relax() function, introduced by 34b133f, is hacky and ugly. It was added a few years ago to address the fact that common cpu_relax() cal arch, locking: Ciao arch_mutex_cpu_relax() The arch_mutex_cpu_relax() function, introduced by 34b133f, is hacky and ugly. It was added a few years ago to address the fact that common cpu_relax() calls include yielding on s390, and thus impact the optimistic spinning functionality of mutexes. Nowadays we use this function well beyond mutexes: rwsem, qrwlock, mcs and lockref. Since the macro that defines the call is in the mutex header, any users must include mutex.h and the naming is misleading as well. This patch (i) renames the call to cpu_relax_lowlatency ("relax, but only if you can do it with very low latency") and (ii) defines it in each arch's asm/processor.h local header, just like for regular cpu_relax functions. On all archs, except s390, cpu_relax_lowlatency is simply cpu_relax, and thus we can take it out of mutex.h. While this can seem redundant, I believe it is a good choice as it allows us to move out arch specific logic from generic locking primitives and enables future(?) archs to transparently define it, similarly to System Z. Signed-off-by: Davidlohr Bueso <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Bharat Bhushan <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Chris Zankel <[email protected]> Cc: David Howells <[email protected]> Cc: David S. Miller <[email protected]> Cc: Deepthi Dharwar <[email protected]> Cc: Dominik Dingel <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Hirokazu Takata <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: James Hogan <[email protected]> Cc: Jason Wang <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Joe Perches <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Joseph Myers <[email protected]> Cc: Kees Cook <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Lennox Wu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mark Salter <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Neuling <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul Burton <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Paul Gortmaker <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Qais Yousef <[email protected]> Cc: Qiaowei Ren <[email protected]> Cc: Rafael Wysocki <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Russell King <[email protected]> Cc: Steven Miao <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Stratos Karafotis <[email protected]> Cc: Tim Chen <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vasily Kulikov <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: Wolfram Sang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...
Revision tags: v3.16-rc3, v3.16-rc2, v3.16-rc1, v3.15, v3.15-rc8, v3.15-rc7, v3.15-rc6, v3.15-rc5, v3.15-rc4, v3.15-rc3, v3.15-rc2, v3.15-rc1, v3.14, v3.14-rc8, v3.14-rc7, v3.14-rc6, v3.14-rc5, v3.14-rc4, v3.14-rc3, v3.14-rc2
# 70af2f8a	03-Feb-2014	Waiman Long <[email protected]>	locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks This rwlock uses the arch_spin_lock_t as a waitqueue, and assuming the arch_spin_lock_t is a fair lock (ticket,mcs etc..) the resulting r locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks This rwlock uses the arch_spin_lock_t as a waitqueue, and assuming the arch_spin_lock_t is a fair lock (ticket,mcs etc..) the resulting rwlock is a fair lock. It fits in the same 8 bytes as the regular rwlock_t by folding the reader and writer count into a single integer, using the remaining 4 bytes for the arch_spinlock_t. Architectures that can single-copy adress bytes can optimize queue_write_unlock() with a 0 write to the LSB (the write count). Performance as measured by Davidlohr Bueso (rwlock_t -> qrwlock_t): +--------------+-------------+---------------+ \| Workload \| #users \| delta \| +--------------+-------------+---------------+ \| alltests \| > 1400 \| -4.83% \| \| custom \| 0-100,> 100 \| +1.43%,-1.57% \| \| high_systime \| > 1000 \| -2.61 \| \| shared \| all \| +0.32 \| +--------------+-------------+---------------+ http://www.stgolabs.net/qrwlock-stuff/aim7-results-vs-rwsem_optsin/ Signed-off-by: Waiman Long <[email protected]> [peterz: near complete rewrite] Signed-off-by: Peter Zijlstra <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: "Paul E.McKenney" <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]> show more ...