|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
ab83647f |
| 03-Nov-2024 |
Alexandre Ghiti <[email protected]> |
riscv: Add qspinlock support
In order to produce a generic kernel, a user can select CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket spinlock implementation if Zabha or Ziccrse a
riscv: Add qspinlock support
In order to produce a generic kernel, a user can select CONFIG_COMBO_SPINLOCKS which will fallback at runtime to the ticket spinlock implementation if Zabha or Ziccrse are not present.
Note that we can't use alternatives here because the discovery of extensions is done too late and we need to start with the qspinlock implementation because the ticket spinlock implementation would pollute the spinlock value, so let's use static keys.
This is largely based on Guo's work and Leonardo reviews at [1].
Link: https://lore.kernel.org/linux-riscv/[email protected]/ [1] Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Alexandre Ghiti <[email protected]> Reviewed-by: Andrea Parri <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1 |
|
| #
125b0bb9 |
| 10-Nov-2023 |
Linus Torvalds <[email protected]> |
asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
We really don't want to do atomic_read() or anything like that, since we already have the value, not the lock. The whole poin
asm-generic: qspinlock: fix queued_spin_value_unlocked() implementation
We really don't want to do atomic_read() or anything like that, since we already have the value, not the lock. The whole point of this is that we've loaded the lock from memory, and we want to check whether the value we loaded was a locked one or not.
The main use of this is the lockref code, which loads both the lock and the reference count in one atomic operation, and then works on that combined value. With the atomic_read(), the compiler would pointlessly spill the value to the stack, in order to then be able to read it back "atomically".
This is the qspinlock version of commit c6f4a9002252 ("asm-generic: ticket-lock: Optimize arch_spin_value_unlocked()") which fixed this same bug for ticket locks.
Cc: Guo Ren <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Waiman Long <[email protected]> Link: https://lore.kernel.org/all/CAHk-=whNRv0v6kQiV5QO6DJhjH4KEL36vWQ6Re8Csrnh4zbRkQ@mail.gmail.com/ Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17 |
|
| #
a8ad07e5 |
| 16-Mar-2022 |
Peter Zijlstra <[email protected]> |
asm-generic: qspinlock: Indicate the use of mixed-size atomics
The qspinlock implementation depends on having well behaved mixed-size atomics. This is true on the more widely-used platforms, but th
asm-generic: qspinlock: Indicate the use of mixed-size atomics
The qspinlock implementation depends on having well behaved mixed-size atomics. This is true on the more widely-used platforms, but these requirements are somewhat subtle and may not be satisfied by all the platforms that qspinlock is used on.
Document these requirements, so ports that use qspinlock can more easily determine if they meet these requirements.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Palmer Dabbelt <[email protected]>
show more ...
|
|
Revision tags: v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1 |
|
| #
f44ca087 |
| 19-Oct-2020 |
Arnd Bergmann <[email protected]> |
qspinlock: use signed temporaries for cmpxchg
When building with W=2, the build log is flooded with
include/asm-generic/qrwlock.h:65:56: warning: pointer targets in passing argument 2 of 'atomic_tr
qspinlock: use signed temporaries for cmpxchg
When building with W=2, the build log is flooded with
include/asm-generic/qrwlock.h:65:56: warning: pointer targets in passing argument 2 of 'atomic_try_cmpxchg_acquire' differ in signedness [-Wpointer-sign] include/asm-generic/qrwlock.h:92:53: warning: pointer targets in passing argument 2 of 'atomic_try_cmpxchg_acquire' differ in signedness [-Wpointer-sign] include/asm-generic/qspinlock.h:68:55: warning: pointer targets in passing argument 2 of 'atomic_try_cmpxchg_acquire' differ in signedness [-Wpointer-sign] include/asm-generic/qspinlock.h:82:52: warning: pointer targets in passing argument 2 of 'atomic_try_cmpxchg_acquire' differ in signedness [-Wpointer-sign]
The atomics are built on top of signed integers, but the caller doesn't actually care. Just use signed types as well.
Fixes: 27df89689e25 ("locking/spinlocks: Remove an instruction from spin and write locks") Signed-off-by: Arnd Bergmann <[email protected]>
show more ...
|
|
Revision tags: v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8 |
|
| #
459e3953 |
| 29-Jul-2020 |
Herbert Xu <[email protected]> |
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
This patch breaks a header loop involving qspinlock_types.h. The issue is that qspinlock_types.h includes atomic.h, which then event
locking/qspinlock: Do not include atomic.h from qspinlock_types.h
This patch breaks a header loop involving qspinlock_types.h. The issue is that qspinlock_types.h includes atomic.h, which then eventually includes kernel.h which could lead back to the original file via spinlock_types.h.
As ATOMIC_INIT is now defined by linux/types.h, there is no longer any need to include atomic.h from qspinlock_types.h. This also allows the CONFIG_PARAVIRT hack to be removed since it was trying to prevent exactly this loop.
Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.8-rc7 |
|
| #
20c0e826 |
| 24-Jul-2020 |
Nicholas Piggin <[email protected]> |
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR
This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to kick and wait.
This uses an un-directed yield to any CPU rathe
powerpc/pseries: Implement paravirt qspinlocks for SPLPAR
This implements the generic paravirt qspinlocks using H_PROD and H_CONFER to kick and wait.
This uses an un-directed yield to any CPU rather than the directed yield to a pre-empted lock holder that paravirtualised simple spinlocks use, that requires no kick hcall. This is something that could be investigated and improved in future.
Performance results can be found in the commit which added queued spinlocks.
Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
aa65ff6b |
| 24-Jul-2020 |
Nicholas Piggin <[email protected]> |
powerpc/64s: Implement queued spinlocks and rwlocks
These have shown significantly improved performance and fairness when spinlock contention is moderate to high on very large systems.
With this se
powerpc/64s: Implement queued spinlocks and rwlocks
These have shown significantly improved performance and fairness when spinlock contention is moderate to high on very large systems.
With this series including subsequent patches, on a 16 socket 1536 thread POWER9, a stress test such as same-file open/close from all CPUs gets big speedups, 11620op/s aggregate with simple spinlocks vs 384158op/s (33x faster), where the difference in throughput between the fastest and slowest thread goes from 7x to 1.4x.
Thanks to the fast path being identical in terms of atomics and barriers (after a subsequent optimisation patch), single threaded performance is not changed (no measurable difference).
On smaller systems, performance and fairness seems to be generally improved. Using dbench on tmpfs as a test (that starts to run into kernel spinlock contention), a 2-socket OpenPOWER POWER9 system was tested with bare metal and KVM guest configurations. Results can be found here:
https://github.com/linuxppc/issues/issues/305#issuecomment-663487453
Observations are:
- Queued spinlocks are equal when contention is insignificant, as expected and as measured with microbenchmarks.
- When there is contention, on bare metal queued spinlocks have better throughput and max latency at all points.
- When virtualised, queued spinlocks are slightly worse approaching peak throughput, but significantly better throughput and max latency at all points beyond peak, until queued spinlock maximum latency rises when clients are 2x vCPUs.
The regressions haven't been analysed very well yet, there are a lot of things that can be tuned, particularly the paravirtualised locking, but the numbers already look like a good net win even on relatively small systems.
Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3 |
|
| #
c942fddf |
| 27-May-2019 |
Thomas Gleixner <[email protected]> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of th
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version [author] [graeme] [gregory] [gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i] [kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema] [hk] [hemahk]@[ti] [com] this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Richard Fontana <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
show more ...
|
|
Revision tags: v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3, v5.1-rc2, v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1 |
|
| #
27df8968 |
| 20-Aug-2018 |
Matthew Wilcox <[email protected]> |
locking/spinlocks: Remove an instruction from spin and write locks
Both spin locks and write locks currently do:
f0 0f b1 17 lock cmpxchg %edx,(%rdi) 85 c0 test %e
locking/spinlocks: Remove an instruction from spin and write locks
Both spin locks and write locks currently do:
f0 0f b1 17 lock cmpxchg %edx,(%rdi) 85 c0 test %eax,%eax 75 05 jne [slowpath]
This 'test' insn is superfluous; the cmpxchg insn sets the Z flag appropriately. Peter pointed out that using atomic_try_cmpxchg_acquire() will let the compiler know this is true. Comparing before/after disassemblies show the only effect is to remove this insn.
Take this opportunity to make the spin & write lock code resemble each other more closely and have similar likely() hints.
Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6 |
|
| #
1362ae43 |
| 14-May-2018 |
Andrea Parri <[email protected]> |
locking/spinlocks: Clean up comment and #ifndef for {,queued_}spin_is_locked()
Removes "#ifndef queued_spin_is_locked" from the generic code: this is unused and it's reasonable to conclude that it w
locking/spinlocks: Clean up comment and #ifndef for {,queued_}spin_is_locked()
Removes "#ifndef queued_spin_is_locked" from the generic code: this is unused and it's reasonable to conclude that it will continue to be unused.
Also removes the comment about spin_is_locked() from mutex_is_locked(): the comment remains valid but not particularly useful.
Suggested-by: Will Deacon <[email protected]> Signed-off-by: Andrea Parri <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.17-rc5, v4.17-rc4, v4.17-rc3 |
|
| #
626e5fbc |
| 26-Apr-2018 |
Will Deacon <[email protected]> |
locking/qspinlock: Use smp_store_release() in queued_spin_unlock()
A qspinlock can be unlocked simply by writing zero to the locked byte. This can be implemented in the generic code, so do that and
locking/qspinlock: Use smp_store_release() in queued_spin_unlock()
A qspinlock can be unlocked simply by writing zero to the locked byte. This can be implemented in the generic code, so do that and remove the arch-specific override for x86 in the !PV case.
Signed-off-by: Will Deacon <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.17-rc2, v4.17-rc1, v4.16, v4.16-rc7, v4.16-rc6, v4.16-rc5, v4.16-rc4, v4.16-rc3, v4.16-rc2, v4.16-rc1, v4.15, v4.15-rc9, v4.15-rc8, v4.15-rc7, v4.15-rc6, v4.15-rc5, v4.15-rc4, v4.15-rc3, v4.15-rc2, v4.15-rc1, v4.14, v4.14-rc8, v4.14-rc7, v4.14-rc6, v4.14-rc5, v4.14-rc4 |
|
| #
a4c1887d |
| 03-Oct-2017 |
Will Deacon <[email protected]> |
locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the non-flags versions by the majority of architect
locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the non-flags versions by the majority of architectures, so do this in core code and remove the dummy implementations. Also remove the implementation in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call local_irq_save(flags) anyway.
Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.14-rc3, v4.14-rc2, v4.14-rc1, v4.13, v4.13-rc7, v4.13-rc6, v4.13-rc5, v4.13-rc4, v4.13-rc3, v4.13-rc2, v4.13-rc1, v4.12 |
|
| #
d3a024ab |
| 29-Jun-2017 |
Paul E. McKenney <[email protected]> |
locking: Remove spin_unlock_wait() generic definitions
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock
locking: Remove spin_unlock_wait() generic definitions
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore removes spin_unlock_wait() and related definitions from core code.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alan Stern <[email protected]> Cc: Andrea Parri <[email protected]> Cc: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.12-rc7, v4.12-rc6, v4.12-rc5, v4.12-rc4, v4.12-rc3, v4.12-rc2, v4.12-rc1, v4.11, v4.11-rc8, v4.11-rc7, v4.11-rc6, v4.11-rc5, v4.11-rc4, v4.11-rc3, v4.11-rc2, v4.11-rc1, v4.10, v4.10-rc8, v4.10-rc7, v4.10-rc6, v4.10-rc5, v4.10-rc4, v4.10-rc3, v4.10-rc2, v4.10-rc1, v4.9, v4.9-rc8, v4.9-rc7, v4.9-rc6, v4.9-rc5, v4.9-rc4, v4.9-rc3, v4.9-rc2, v4.9-rc1, v4.8, v4.8-rc8, v4.8-rc7, v4.8-rc6, v4.8-rc5, v4.8-rc4, v4.8-rc3, v4.8-rc2, v4.8-rc1, v4.7, v4.7-rc7, v4.7-rc6, v4.7-rc5, v4.7-rc4, v4.7-rc3, v4.7-rc2 |
|
| #
ca50e426 |
| 03-Jun-2016 |
Pan Xinhui <[email protected]> |
locking/qspinlock: Use atomic_sub_return_release() in queued_spin_unlock()
The existing version uses a heavy barrier while only release semantics is required. So use atomic_sub_return_release() inst
locking/qspinlock: Use atomic_sub_return_release() in queued_spin_unlock()
The existing version uses a heavy barrier while only release semantics is required. So use atomic_sub_return_release() instead.
Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Pan Xinhui <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
2c610022 |
| 08-Jun-2016 |
Peter Zijlstra <[email protected]> |
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_u
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage in ipc/sem and netfilter, it does not in fact work right for the usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B) if (!spin_is_locked(B)) spin_unlock_wait(A) foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and spin_unlock_wait(), this is not sufficient for:
flag = 1; smp_mb(); spin_lock() spin_unlock_wait() if (!flag) // add to lockless list // iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed past the load of flag, uncrossing the variables and loosing the guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in both cases by exploiting the observation that while the lock byte store can be delayed, the contender must have registered itself visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC and ARM64 have an additional issue for which we currently have no generic solution.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Giovanni Gherdovich <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pan Xinhui <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] # v4.2 and later Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.7-rc1 |
|
| #
54cf809b |
| 20-May-2016 |
Peter Zijlstra <[email protected]> |
locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
Similar to commits:
51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin
locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()
Similar to commits:
51d7d5205d33 ("powerpc: Add smp_mb() to arch_spin_is_locked()") d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
qspinlock suffers from the fact that the _Q_LOCKED_VAL store is unordered inside the ACQUIRE of the lock.
And while this is not a problem for the regular mutual exclusive critical section usage of spinlocks, it breaks creative locking like:
spin_lock(A) spin_lock(B) spin_unlock_wait(B) if (!spin_is_locked(A)) do_something() do_something()
In that both CPUs can end up running do_something at the same time, because our _Q_LOCKED_VAL store can drop past the spin_unlock_wait() spin_is_locked() loads (even on x86!!).
To avoid making the normal case slower, add smp_mb()s to the less used spin_unlock_wait() / spin_is_locked() side of things to avoid this problem.
Reported-and-tested-by: Davidlohr Bueso <[email protected]> Reported-by: Giovanni Gherdovich <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] # v4.2 and later Signed-off-by: Linus Torvalds <[email protected]>
show more ...
|
|
Revision tags: v4.6, v4.6-rc7, v4.6-rc6, v4.6-rc5, v4.6-rc4, v4.6-rc3, v4.6-rc2, v4.6-rc1, v4.5, v4.5-rc7, v4.5-rc6, v4.5-rc5 |
|
| #
b82e5302 |
| 19-Feb-2016 |
Dan Streetman <[email protected]> |
locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
Move the __ARCH_SPIN_LOCK_UNLOCKED definition from qspinlock.h into qspinlock_types.h.
The definition of __ARCH_SPIN_LOCK_UNLO
locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h
Move the __ARCH_SPIN_LOCK_UNLOCKED definition from qspinlock.h into qspinlock_types.h.
The definition of __ARCH_SPIN_LOCK_UNLOCKED comes from the build arch's include files; but on x86 when CONFIG_QUEUED_SPINLOCKS=y, it just it's defined in asm-generic/qspinlock.h. In most cases, this doesn't matter because linux/spinlock.h includes asm/spinlock.h, which for x86 includes asm-generic/qspinlock.h. However, any code that only includes linux/mutex.h will break, because it only includes asm/spinlock_types.h.
For example, this breaks systemtap, which only includes mutex.h.
Signed-off-by: Dan Streetman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.5-rc4, v4.5-rc3, v4.5-rc2, v4.5-rc1, v4.4, v4.4-rc8, v4.4-rc7, v4.4-rc6, v4.4-rc5, v4.4-rc4, v4.4-rc3, v4.4-rc2, v4.4-rc1 |
|
| #
64d816cb |
| 10-Nov-2015 |
Waiman Long <[email protected]> |
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
This patch replaces the cmpxchg() and xchg() calls in the native qspinlock code with the more relaxed _acquire or _release v
locking/qspinlock: Use _acquire/_release() versions of cmpxchg() & xchg()
This patch replaces the cmpxchg() and xchg() calls in the native qspinlock code with the more relaxed _acquire or _release versions of those calls to enable other architectures to adopt queued spinlocks with less memory barrier performance overhead.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.3, v4.3-rc7, v4.3-rc6, v4.3-rc5, v4.3-rc4, v4.3-rc3, v4.3-rc2, v4.3-rc1 |
|
| #
43b3f028 |
| 04-Sep-2015 |
Peter Zijlstra <[email protected]> |
locking/qspinlock/x86: Fix performance regression under unaccelerated VMs
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS set and Linus noted that the test-and-set implementati
locking/qspinlock/x86: Fix performance regression under unaccelerated VMs
Dave ran into horrible performance on a VM without PARAVIRT_SPINLOCKS set and Linus noted that the test-and-set implementation was retarded.
One should spin on the variable with a load, not a RMW.
While there, remove 'queued' from the name, as the lock isn't queued at all, but a simple test-and-set.
Suggested-by: Linus Torvalds <[email protected]> Reported-by: Dave Chinner <[email protected]> Tested-by: Dave Chinner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] # v4.2+ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v4.2, v4.2-rc8, v4.2-rc7, v4.2-rc6, v4.2-rc5, v4.2-rc4, v4.2-rc3, v4.2-rc2, v4.2-rc1, v4.1, v4.1-rc8, v4.1-rc7, v4.1-rc6, v4.1-rc5, v4.1-rc4, v4.1-rc3, v4.1-rc2, v4.1-rc1 |
|
| #
2aa79af6 |
| 24-Apr-2015 |
Peter Zijlstra (Intel) <[email protected]> |
locking/qspinlock: Revert to test-and-set on hypervisors
When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of
locking/qspinlock: Revert to test-and-set on hypervisors
When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel J Blueman <[email protected]> Cc: David Vrabel <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Raghavendra K T <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
a33fda35 |
| 24-Apr-2015 |
Waiman Long <[email protected]> |
locking/qspinlock: Introduce a simple generic 4-byte queued spinlock
This patch introduces a new generic queued spinlock implementation that can serve as an alternative to the default ticket spinloc
locking/qspinlock: Introduce a simple generic 4-byte queued spinlock
This patch introduces a new generic queued spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queued spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much faster in high contention situations especially when the spinlock is embedded within the data structure to be protected.
Only in light to moderate contention where the average queue depth is around 1-3 will this queued spinlock be potentially a bit slower due to the higher slowpath overhead.
This queued spinlock is especially suit to NUMA machines with a large number of cores as the chance of spinlock contention is much higher in those machines. The cost of contention is also higher because of slower inter-node memory traffic.
Due to the fact that spinlocks are acquired with preemption disabled, the process will not be migrated to another CPU while it is trying to get a spinlock. Ignoring interrupt handling, a CPU can only be contending in one spinlock at any one time. Counting soft IRQ, hard IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting activities. By allocating a set of per-cpu queue nodes and used them to form a waiting queue, we can encode the queue node address into a much smaller 24-bit size (including CPU number and queue node index) leaving one byte for the lock.
Please note that the queue node is only needed when waiting for the lock. Once the lock is acquired, the queue node can be released to be used later.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Daniel J Blueman <[email protected]> Cc: David Vrabel <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Raghavendra K T <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|