|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
b76b44fb |
| 07-Mar-2025 |
Waiman Long <[email protected]> |
locking/lock_events: Add locking events for rtmutex slow paths
Add locking events for rtlock_slowlock() and rt_mutex_slowlock() for profiling the slow path behavior of rt_spin_lock() and rt_mutex_lo
locking/lock_events: Add locking events for rtmutex slow paths
Add locking events for rtlock_slowlock() and rt_mutex_slowlock() for profiling the slow path behavior of rt_spin_lock() and rt_mutex_lock().
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4 |
|
| #
abfdccd6 |
| 17-Dec-2024 |
John Stultz <[email protected]> |
sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabled
A common pattern seen when wake_qs are used to defer a wakeup until after a lock is released is something like: pre
sched/wake_q: Add helper to call wake_up_q after unlock with preemption disabled
A common pattern seen when wake_qs are used to defer a wakeup until after a lock is released is something like: preempt_disable(); raw_spin_unlock(lock); wake_up_q(wake_q); preempt_enable();
So create some raw_spin_unlock*_wake() helper functions to clean this up.
Applies on top of the fix I submitted here: https://lore.kernel.org/lkml/[email protected]/
NOTE: I recognise the unlock()/unlock_irq()/unlock_irqrestore() variants creates its own duplication, which we could use a macro to generate the similar functions, but I often dislike how those generation macros making finding the actual implementation harder, so I left the three functions as is. If folks would prefer otherwise, let me know and I'll switch it.
Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc3 |
|
| #
4a077914 |
| 12-Dec-2024 |
John Stultz <[email protected]> |
locking/rtmutex: Make sure we wake anything on the wake_q when we release the lock->wait_lock
Bert reported seeing occasional boot hangs when running with PREEPT_RT and bisected it down to commit 89
locking/rtmutex: Make sure we wake anything on the wake_q when we release the lock->wait_lock
Bert reported seeing occasional boot hangs when running with PREEPT_RT and bisected it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock").
It looks like I missed a few spots where we drop the wait_lock and potentially call into schedule without waking up the tasks on the wake_q structure. Since the tasks being woken are ww_mutex tasks they need to be able to run to release the mutex and unblock the task that currently is planning to wake them. Thus we can deadlock.
So make sure we wake the wake_q tasks when we unlock the wait_lock.
Closes: https://lore.kernel.org/lkml/[email protected] Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock") Reported-by: Bert Karwatzki <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12 |
|
| #
82f9cc09 |
| 14-Nov-2024 |
John Stultz <[email protected]> |
locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex
Anders had bisected a crash using PREEMPT_RT with linux-next and isolated it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups
locking: rtmutex: Fix wake_q logic in task_blocks_on_rt_mutex
Anders had bisected a crash using PREEMPT_RT with linux-next and isolated it down to commit 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock"), where it seemed the wake_q structure was somehow getting corrupted causing a null pointer traversal.
I was able to easily repoduce this with PREEMPT_RT and managed to isolate down that through various call stacks we were actually calling wake_up_q() twice on the same wake_q.
I found that in the problematic commit, I had added the wake_up_q() call in task_blocks_on_rt_mutex() around __ww_mutex_add_waiter(), following a similar pattern in __mutex_lock_common().
However, its just wrong. We haven't dropped the lock->wait_lock, so its contrary to the point of the original patch. And it didn't match the __mutex_lock_common() logic of re-initializing the wake_q after calling it midway in the stack.
Looking at it now, the wake_up_q() call is incorrect and should just be removed. So drop the erronious logic I had added.
Fixes: 894d1b3db41c ("locking/mutex: Remove wakeups from under mutex::wait_lock") Closes: https://lore.kernel.org/lkml/[email protected]/ Reported-by: Anders Roxell <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Juri Lelli <[email protected]> Tested-by: Anders Roxell <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
77abd3b7 |
| 12-Aug-2024 |
Sebastian Andrzej Siewior <[email protected]> |
locking/rt: Annotate unlock followed by lock for sparse.
rt_mutex_slowlock_block() and rtlock_slowlock_locked() both unlock lock::wait_lock and then lock it later. This is unusual and sparse complai
locking/rt: Annotate unlock followed by lock for sparse.
rt_mutex_slowlock_block() and rtlock_slowlock_locked() both unlock lock::wait_lock and then lock it later. This is unusual and sparse complains about it.
Add __releases() + __acquires() annotation to mark that it is expected.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
| #
894d1b3d |
| 09-Oct-2024 |
Peter Zijlstra <[email protected]> |
locking/mutex: Remove wakeups from under mutex::wait_lock
In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it.
Do this by utilizing wake_qs to defer the w
locking/mutex: Remove wakeups from under mutex::wait_lock
In preparation to nest mutex::wait_lock under rq::lock we need to remove wakeups from under it.
Do this by utilizing wake_qs to defer the wakeup until after the lock is dropped.
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and 08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait mutexes")] [jstultz: rebased to mainline, added extra wake_up_q & init to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Juri Lelli <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Metin Kaya <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Tested-by: K Prateek Nayak <[email protected]> Tested-by: Metin Kaya <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d33d2603 |
| 15-Aug-2024 |
Roland Xu <[email protected]> |
rtmutex: Drop rt_mutex::wait_lock before scheduling
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the good case it returns with the lock held and in the deadlock case it em
rtmutex: Drop rt_mutex::wait_lock before scheduling
rt_mutex_handle_deadlock() is called with rt_mutex::wait_lock held. In the good case it returns with the lock held and in the deadlock case it emits a warning and goes into an endless scheduling loop with the lock held, which triggers the 'scheduling in atomic' warning.
Unlock rt_mutex::wait_lock in the dead lock case before issuing the warning and dropping into the schedule for ever loop.
[ tglx: Moved unlock before the WARN(), removed the pointless comment, massaged changelog, added Fixes tag ]
Fixes: 3d5c9340d194 ("rtmutex: Handle deadlock detection smarter") Signed-off-by: Roland Xu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/ME0P300MB063599BEF0743B8FA339C2CECC802@ME0P300MB0635.AUSP300.PROD.OUTLOOK.COM
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
ae04f69d |
| 10-Jun-2024 |
Qais Yousef <[email protected]> |
sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()
Some find the name realtime overloaded. Use rt_or_dl() as an alternative, hopefully better, name.
Suggested-by: Daniel Bristot de
sched/rt: Rename realtime_{prio, task}() to rt_or_dl_{prio, task}()
Some find the name realtime overloaded. Use rt_or_dl() as an alternative, hopefully better, name.
Suggested-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
130fd056 |
| 10-Jun-2024 |
Qais Yousef <[email protected]> |
sched/rt: Clean up usage of rt_task()
rt_task() checks if a task has RT priority. But depends on your dictionary, this could mean it belongs to RT class, or is a 'realtime' task, which includes RT a
sched/rt: Clean up usage of rt_task()
rt_task() checks if a task has RT priority. But depends on your dictionary, this could mean it belongs to RT class, or is a 'realtime' task, which includes RT and DL classes.
Since this has caused some confusion already on discussion [1], it seemed a clean up is due.
I define the usage of rt_task() to be tasks that belong to RT class. Make sure that it returns true only for RT class and audit the users and replace the ones required the old behavior with the new realtime_task() which returns true for RT and DL classes. Introduce similar realtime_prio() to create similar distinction to rt_prio() and update the users that required the old behavior to use the new function.
Move MAX_DL_PRIO to prio.h so it can be used in the new definitions.
Document the functions to make it more obvious what is the difference between them. PI-boosted tasks is a factor that must be taken into account when choosing which function to use.
Rename task_is_realtime() to realtime_task_policy() as the old name is confusing against the new realtime_task().
No functional changes were intended.
[1] https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Phil Auld <[email protected]> Reviewed-by: "Steven Rostedt (Google)" <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
ce3576eb |
| 24-Jan-2024 |
Uros Bizjak <[email protected]> |
locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()
Use try_cmpxchg() instead of cmpxchg(*ptr, old, new) == old.
The x86 CMPXCHG instruction returns success in the ZF flag, so thi
locking/rtmutex: Use try_cmpxchg_relaxed() in mark_rt_mutex_waiters()
Use try_cmpxchg() instead of cmpxchg(*ptr, old, new) == old.
The x86 CMPXCHG instruction returns success in the ZF flag, so this change saves a compare after CMPXCHG (and related move instruction in front of CMPXCHG).
Also, try_cmpxchg() implicitly assigns old *ptr value to "old" when CMPXCHG fails. There is no need to re-read the value in the loop.
Note that the value from *ptr should be read using READ_ONCE() to prevent the compiler from merging, refetching or reordering the read.
No functional change intended.
Signed-off-by: Uros Bizjak <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Waiman Long <[email protected]> Cc: Will Deacon <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1 |
|
| #
45f67f30 |
| 08-Sep-2023 |
Thomas Gleixner <[email protected]> |
locking/rtmutex: Add a lockdep assert to catch potential nested blocking
There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex
locking/rtmutex: Add a lockdep assert to catch potential nested blocking
There used to be a BUG_ON(current->pi_blocked_on) in the lock acquisition functions, but that vanished in one of the rtmutex overhauls.
Bring it back in form of a lockdep assert to catch code paths which take rtmutex based locks with current::pi_blocked_on != NULL.
Reported-by: Crystal Wood <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: "Peter Zijlstra (Intel)" <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
d14f9e93 |
| 08-Sep-2023 |
Sebastian Andrzej Siewior <[email protected]> |
locking/rtmutex: Use rt_mutex specific scheduler helpers
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state.
[[ peterz: adapted to new names ]]
locking/rtmutex: Use rt_mutex specific scheduler helpers
Have rt_mutex use the rt_mutex specific scheduler helpers to avoid recursion vs rtlock on the PI state.
[[ peterz: adapted to new names ]]
Reported-by: Crystal Wood <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
af9f0063 |
| 08-Sep-2023 |
Sebastian Andrzej Siewior <[email protected]> |
locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES
With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path.
Pr
locking/rtmutex: Avoid unconditional slowpath for DEBUG_RT_MUTEXES
With DEBUG_RT_MUTEXES enabled the fast-path rt_mutex_cmpxchg_acquire() always fails and all lock operations take the slow path.
Provide a new helper inline rt_mutex_try_acquire() which maps to rt_mutex_cmpxchg_acquire() in the non-debug case. For the debug case it invokes rt_mutex_slowtrylock() which can acquire a non-contended rtmutex under full debug coverage.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
f7853c34 |
| 07-Jul-2023 |
Peter Zijlstra <[email protected]> |
locking/rtmutex: Fix task->pi_waiters integrity
Henry reported that rt_mutex_adjust_prio_check() has an ordering problem and puts the lie to the comment in [7]. Sharing the sort key between lock->wa
locking/rtmutex: Fix task->pi_waiters integrity
Henry reported that rt_mutex_adjust_prio_check() has an ordering problem and puts the lie to the comment in [7]. Sharing the sort key between lock->waiters and owner->pi_waiters *does* create problems, since unlike what the comment claims, holding [L] is insufficient.
Notably, consider:
A / \ M1 M2 | | B C
That is, task A owns both M1 and M2, B and C block on them. In this case a concurrent chain walk (B & C) will modify their resp. sort keys in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L] is meaningless, they're different Ls.
This then gives rise to a race condition between [7] and [11], where the requeue of pi_waiters will observe an inconsistent tree order.
B C
(holds M1->wait_lock, (holds M2->wait_lock, holds B->pi_lock) holds A->pi_lock)
[7] waiter_update_prio(); ... [8] raw_spin_unlock(B->pi_lock); ... [10] raw_spin_lock(A->pi_lock);
[11] rt_mutex_enqueue_pi(); // observes inconsistent A->pi_waiters // tree order
Fixing this means either extending the range of the owner lock from [10-13] to [6-13], with the immediate problem that this means [6-8] hold both blocked and owner locks, or duplicating the sort key.
Since the locking in chain walk is horrible enough without having to consider pi_lock nesting rules, duplicate the sort key instead.
By giving each tree their own sort key, the above race becomes harmless, if C sees B at the old location, then B will correct things (if they need correcting) when it walks up the chain and reaches A.
Fixes: fb00aca47440 ("rtmutex: Turn the plist into an rb-tree") Reported-by: Henry Wu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Tested-by: Henry Wu <[email protected]> Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
show more ...
|
|
Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7 |
|
| #
db370a8b |
| 02-Feb-2023 |
Wander Lairson Costa <[email protected]> |
rtmutex: Ensure that the top waiter is always woken up
Let L1 and L2 be two spinlocks.
Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2.
Let T2 be the task hol
rtmutex: Ensure that the top waiter is always woken up
Let L1 and L2 be two spinlocks.
Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top waiter of L2.
Let T2 be the task holding L2.
Let T3 be a task trying to acquire L1.
The following events will lead to a state in which the wait queue of L2 isn't empty, but no task actually holds the lock.
T1 T2 T3 == == ==
spin_lock(L1) | raw_spin_lock(L1->wait_lock) | rtlock_slowlock_locked(L1) | | task_blocks_on_rt_mutex(L1, T3) | | | orig_waiter->lock = L1 | | | orig_waiter->task = T3 | | | raw_spin_unlock(L1->wait_lock) | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3) spin_unlock(L2) | | | | | rt_mutex_slowunlock(L2) | | | | | | raw_spin_lock(L2->wait_lock) | | | | | | wakeup(T1) | | | | | | raw_spin_unlock(L2->wait_lock) | | | | | | | | waiter = T1->pi_blocked_on | | | | waiter == rt_mutex_top_waiter(L2) | | | | waiter->task == T1 | | | | raw_spin_lock(L2->wait_lock) | | | | dequeue(L2, waiter) | | | | update_prio(waiter, T1) | | | | enqueue(L2, waiter) | | | | waiter != rt_mutex_top_waiter(L2) | | | | L2->owner == NULL | | | | wakeup(T1) | | | | raw_spin_unlock(L2->wait_lock) T1 wakes up T1 != top_waiter(L2) schedule_rtlock()
If the deadline of T1 is updated before the call to update_prio(), and the new deadline is greater than the deadline of the second top waiter, then after the requeue, T1 is no longer the top waiter, and the wrong task is woken up which will then go back to sleep because it is not the top waiter.
This can be reproduced in PREEMPT_RT with stress-ng:
while true; do stress-ng --sched deadline --sched-period 1000000000 \ --sched-runtime 800000000 --sched-deadline \ 1000000000 --mmapfork 23 -t 20 done
A similar issue was pointed out by Thomas versus the cases where the top waiter drops out early due to a signal or timeout, which is a general issue for all regular rtmutex use cases, e.g. futex.
The problematic code is in rt_mutex_adjust_prio_chain():
// Save the top waiter before dequeue/enqueue prerequeue_top_waiter = rt_mutex_top_waiter(lock);
rt_mutex_dequeue(lock, waiter); waiter_update_prio(waiter, task); rt_mutex_enqueue(lock, waiter);
// Lock has no owner? if (!rt_mutex_owner(lock)) { // Top waiter changed ----> if (prerequeue_top_waiter != rt_mutex_top_waiter(lock)) ----> wake_up_state(waiter->task, waiter->wake_state);
This only takes the case into account where @waiter is the new top waiter due to the requeue operation.
But it fails to handle the case where @waiter is not longer the top waiter due to the requeue operation.
Ensure that the new top waiter is woken up so in all cases so it can take over the ownerless lock.
[ tglx: Amend changelog, add Fixes tag ]
Fixes: c014ef69b3ac ("locking/rtmutex: Add wake_state to rt_mutex_waiter") Signed-off-by: Wander Lairson Costa <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8 |
|
| #
1c0908d8 |
| 02-Dec-2022 |
Mel Gorman <[email protected]> |
rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64.
kernel BUG at fs/inode.c:625! In
rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64.
kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration.
[ [email protected]: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Reported-by: Jan Kara <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1 |
|
| #
ee042be1 |
| 22-Mar-2022 |
Namhyung Kim <[email protected]> |
locking: Apply contention tracepoints in the slow path
Adding the lock contention tracepoints in various lock function slow paths. Note that each arch can define spinlock differently, I only added
locking: Apply contention tracepoints in the slow path
Adding the lock contention tracepoints in various lock function slow paths. Note that each arch can define spinlock differently, I only added it only to the generic qspinlock for now.
Signed-off-by: Namhyung Kim <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6 |
|
| #
8f556a32 |
| 17-Dec-2021 |
Zqiang <[email protected]> |
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is
locking/rtmutex: Fix incorrect condition in rtmutex_spin_on_owner()
Optimistic spinning needs to be terminated when the spinning waiter is not longer the top waiter on the lock, but the condition is negated. It terminates if the waiter is the top waiter, which is defeating the whole purpose.
Fixes: c3123c431447 ("locking/rtmutex: Dont dereference waiter lockless") Signed-off-by: Zqiang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.16-rc5, v5.16-rc4 |
|
| #
c0bed69d |
| 03-Dec-2021 |
Kefeng Wang <[email protected]> |
locking: Make owner_on_cpu() into <linux/sched.h>
Move the owner_on_cpu() from kernel/locking/rwsem.c into include/linux/sched.h with under CONFIG_SMP, then use it in the mutex/rwsem/rtmutex to simp
locking: Make owner_on_cpu() into <linux/sched.h>
Move the owner_on_cpu() from kernel/locking/rwsem.c into include/linux/sched.h with under CONFIG_SMP, then use it in the mutex/rwsem/rtmutex to simplify the code.
Signed-off-by: Kefeng Wang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
02ea9fc9 |
| 29-Nov-2021 |
Peter Zijlstra <[email protected]> |
locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
Similar to the issues in commits:
6467822b8cc9 ("locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes") a055fcc132d
locking/rtmutex: Squash self-deadlock check for ww_rt_mutex.
Similar to the issues in commits:
6467822b8cc9 ("locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes") a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")
ww_rt_mutex_lock() should not return EDEADLK without first going through the __ww_mutex logic to set the required state. In fact, the chain-walk can deal with the spurious cycles (per the above commits) this check warns about and is trying to avoid.
Therefore ignore this test for ww_rt_mutex and simply let things fall in place.
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4 |
|
| #
9321f815 |
| 28-Sep-2021 |
Thomas Gleixner <[email protected]> |
rtmutex: Wake up the waiters lockless while dropping the read lock.
The rw_semaphore and rwlock_t implementation both wake the waiter while holding the rt_mutex_base::wait_lock acquired. This can be
rtmutex: Wake up the waiters lockless while dropping the read lock.
The rw_semaphore and rwlock_t implementation both wake the waiter while holding the rt_mutex_base::wait_lock acquired. This can be optimized by waking the waiter lockless outside of the locked section to avoid a needless contention on the rt_mutex_base::wait_lock lock.
Extend rt_mutex_wake_q_add() to also accept task and state and use it in __rwbase_read_unlock().
Suggested-by: Davidlohr Bueso <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
8fe46535 |
| 28-Sep-2021 |
Sebastian Andrzej Siewior <[email protected]> |
rtmutex: Check explicit for TASK_RTLOCK_WAIT.
rt_mutex_wake_q_add() needs to need to distiguish between sleeping locks (TASK_RTLOCK_WAIT) and normal locks which use TASK_NORMAL to use the proper wa
rtmutex: Check explicit for TASK_RTLOCK_WAIT.
rt_mutex_wake_q_add() needs to need to distiguish between sleeping locks (TASK_RTLOCK_WAIT) and normal locks which use TASK_NORMAL to use the proper wake mechanism.
Instead of checking for != TASK_NORMAL make it more robust and check explicit for TASK_RTLOCK_WAIT which is the reason why a different wake mechanism is used.
No functional change.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc3, v5.15-rc2, v5.15-rc1 |
|
| #
e5480572 |
| 01-Sep-2021 |
Peter Zijlstra <[email protected]> |
locking/rtmutex: Fix ww_mutex deadlock check
Dan reported that rt_mutex_adjust_prio_chain() can be called with .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return success on d
locking/rtmutex: Fix ww_mutex deadlock check
Dan reported that rt_mutex_adjust_prio_chain() can be called with .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") unconditionally dereferences it.
Since both call-sites that have .orig_waiter == NULL don't care for the return value, simply disable the deadlock squash by adding the NULL check.
Notably, both callers use the deadlock condition as a termination condition for the iteration; once detected, it is sure that (de)boosting is done. Arguably step [3] would be a more natural termination point, but it's dubious whether adding a third deadlock detection state would improve the code.
Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.14 |
|
| #
a055fcc1 |
| 26-Aug-2021 |
Peter Zijlstra <[email protected]> |
locking/rtmutex: Return success on deadlock for ww_mutex waiters
ww_mutexes can legitimately cause a deadlock situation in the lock graph which is resolved afterwards by the wait/wound mechanics. Th
locking/rtmutex: Return success on deadlock for ww_mutex waiters
ww_mutexes can legitimately cause a deadlock situation in the lock graph which is resolved afterwards by the wait/wound mechanics. The rtmutex chain walk can detect such a deadlock and returns EDEADLK which in turn skips the wait/wound mechanism and returns EDEADLK to the caller. That's wrong because both lock chains might get EDEADLK or the wrong waiter would back out.
Detect that situation and return 'success' in case that the waiter which initiated the chain walk is a ww_mutex with context. This allows the wait/wound mechanics to resolve the situation according to the rules.
[ tglx: Split it apart and added changelog ]
Reported-by: Sebastian Siewior <[email protected]> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
6467822b |
| 26-Aug-2021 |
Peter Zijlstra <[email protected]> |
locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph which can be observed by a blocker which didn't cause
locking/rtmutex: Prevent spurious EDEADLK return caused by ww_mutexes
rtmutex based ww_mutexes can legitimately create a cycle in the lock graph which can be observed by a blocker which didn't cause the problem:
P1: A, ww_A, ww_B P2: ww_B, ww_A P3: A
P3 might therefore be trapped in the ww_mutex induced cycle and run into the lock depth limitation of rt_mutex_adjust_prio_chain() which returns -EDEADLK to the caller.
Disable the deadlock detection walk when the chain walk observes a ww_mutex to prevent this looping.
[ tglx: Split it apart and added changelog ]
Reported-by: Sebastian Siewior <[email protected]> Fixes: add461325ec5 ("locking/rtmutex: Extend the rtmutex core to support ww_mutex") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|