|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14 |
|
| #
61c39d8c |
| 21-Mar-2025 |
Ryo Takakura <[email protected]> |
lockdep: Fix wait context check on softirq for PREEMPT_RT
Since:
0c1d7a2c2d32 ("lockdep: Remove softirq accounting on PREEMPT_RT.")
the wait context test for mutex usage within "in softirq conte
lockdep: Fix wait context check on softirq for PREEMPT_RT
Since:
0c1d7a2c2d32 ("lockdep: Remove softirq accounting on PREEMPT_RT.")
the wait context test for mutex usage within "in softirq context" fails as it references @softirq_context:
| wait context tests | -------------------------------------------------------------------------- | rcu | raw | spin |mutex | -------------------------------------------------------------------------- in hardirq context: ok | ok | ok | ok | in hardirq context (not threaded): ok | ok | ok | ok | in softirq context: ok | ok | ok |FAILED|
As a fix, add lockdep map for BH disabled section. This fixes the issue by letting us catch cases when local_bh_disable() gets called with preemption disabled where local_lock doesn't get acquired. In the case of "in softirq context" selftest, local_bh_disable() was being called with preemption disable as it's early in the boot.
[ boqun: Move the lockdep annotations into __local_bh_*() to avoid false positives because of unpaired local_bh_disable() reported by Borislav Petkov and Peter Zijlstra, and make bh_lock_map only exist for PREEMPT_RT. ]
[ mingo: Restored authorship and improved the bh_lock_map definition. ]
Signed-off-by: Ryo Takakura <[email protected]> Signed-off-by: Boqun Feng <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1 |
|
| #
6675ce20 |
| 19-Nov-2024 |
K Prateek Nayak <[email protected]> |
softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call
softirq: Allow raising SCHED_SOFTIRQ from SMP-call-function on RT kernel
do_softirq_post_smp_call_flush() on PREEMPT_RT kernels carries a WARN_ON_ONCE() for any SOFTIRQ being raised from an SMP-call-function. Since do_softirq_post_smp_call_flush() is called with preempt disabled, raising a SOFTIRQ during flush_smp_call_function_queue() can lead to longer preempt disabled sections.
Since commit b2a02fc43a1f ("smp: Optimize send_call_function_single_ipi()") IPIs to an idle CPU in TIF_POLLING_NRFLAG mode can be optimized out by instead setting TIF_NEED_RESCHED bit in idle task's thread_info and relying on the flush_smp_call_function_queue() in the idle-exit path to run the SMP-call-function.
To trigger an idle load balancing, the scheduler queues nohz_csd_function() responsible for triggering an idle load balancing on a target nohz idle CPU and sends an IPI. Only now, this IPI is optimized out and the SMP-call-function is executed from flush_smp_call_function_queue() in do_idle() which can raise a SCHED_SOFTIRQ to trigger the balancing.
So far, this went undetected since, the need_resched() check in nohz_csd_function() would make it bail out of idle load balancing early as the idle thread does not clear TIF_POLLING_NRFLAG before calling flush_smp_call_function_queue(). The need_resched() check was added with the intent to catch a new task wakeup, however, it has recently discovered to be unnecessary and will be removed in the subsequent commit after which nohz_csd_function() can raise a SCHED_SOFTIRQ from flush_smp_call_function_queue() to trigger an idle load balance on an idle target in TIF_POLLING_NRFLAG mode.
nohz_csd_function() bails out early if "idle_cpu()" check for the target CPU, and does not lock the target CPU's rq until the very end, once it has found tasks to run on the CPU and will not inhibit the wakeup of, or running of a newly woken up higher priority task. Account for this and prevent a WARN_ON_ONCE() when SCHED_SOFTIRQ is raised from flush_smp_call_function_queue().
Signed-off-by: K Prateek Nayak <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12, v6.12-rc7 |
|
| #
49a17639 |
| 06-Nov-2024 |
Sebastian Andrzej Siewior <[email protected]> |
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
The timer and hrtimer soft interrupts are raised in hard interrupt context. With threaded interrupts force enabled or on PREEMPT_RT t
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
The timer and hrtimer soft interrupts are raised in hard interrupt context. With threaded interrupts force enabled or on PREEMPT_RT this leads to waking the ksoftirqd for the processing of the soft interrupt.
ksoftirqd runs as SCHED_OTHER task which means it will compete with other tasks for CPU resources. This can introduce long delays for timer processing on heavy loaded systems and is not desired.
Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated timers thread and let it run at the lowest SCHED_FIFO priority. Wake-ups for RT tasks happen from hardirq context so only timer_list timers and hrtimers for "regular" tasks are processed here. The higher priority ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
Using a dedicated variable to store the pending softirq bits values ensure that the timer are not accidentally picked up by ksoftirqd and other threaded interrupts.
It shouldn't be picked up by ksoftirqd since it runs at lower priority. However if ksoftirqd is already running while a timer fires, then ksoftird will be PI-boosted due to the BH-lock to ktimer's priority.
The timer thread can pick up pending softirqs from ksoftirqd but only if the softirq load is high. It is not be desired that the picked up softirqs are processed at SCHED_FIFO priority under high softirq load but this can already happen by a PI-boost by a force-threaded interrupt.
[ [email protected]: rcutorture.c fixes, storm fix by introduction of local_timers_pending() for tick_nohz_next_event() ]
[ [email protected]: Ensure ktimersd gets woken up even if a softirq is currently served. ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> [rcutorture] Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
49994911 |
| 25-Sep-2024 |
NeilBrown <[email protected]> |
softirq: use bit waits instead of var waits.
The waiting in softirq.c is always waiting for a bit to be cleared. This makes the bit wait functions seem more suitable. By switching over we can rid of
softirq: use bit waits instead of var waits.
The waiting in softirq.c is always waiting for a bit to be cleared. This makes the bit wait functions seem more suitable. By switching over we can rid of all explicit barriers. We also use wait_on_bit_lock() to avoid an explicit loop.
Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4 |
|
| #
e68ac2b4 |
| 15-Aug-2024 |
Caleb Sander Mateos <[email protected]> |
softirq: Remove unused 'action' parameter from action callback
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function poi
softirq: Remove unused 'action' parameter from action callback
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function pointer.
This pointer isn't useful, as the action callback already knows what function it is. And since each callback handles a specific soft interrupt, the callback also knows which soft interrupt number is running.
No soft interrupt action callback actually uses this parameter, so remove it from the function pointer signature. This clarifies that soft interrupt actions are global routines and makes it slightly cheaper to call them.
Signed-off-by: Caleb Sander Mateos <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6 |
|
| #
1dd1eff1 |
| 27-Apr-2024 |
Zqiang <[email protected]> |
softirq: Fix suspicious RCU usage in __do_softirq()
Currently, the condition "__this_cpu_read(ksoftirqd) == current" is used to invoke rcu_softirq_qs() in ksoftirqd tasks context for non-RT kernels.
softirq: Fix suspicious RCU usage in __do_softirq()
Currently, the condition "__this_cpu_read(ksoftirqd) == current" is used to invoke rcu_softirq_qs() in ksoftirqd tasks context for non-RT kernels.
This works correctly as long as the context is actually task context but this condition is wrong when:
- the current task is ksoftirqd - the task is interrupted in a RCU read side critical section - __do_softirq() is invoked on return from interrupt
Syzkaller triggered the following scenario:
-> finish_task_switch() -> put_task_struct_rcu_user() -> call_rcu(&task->rcu, delayed_put_task_struct) -> __kasan_record_aux_stack() -> pfn_valid() -> rcu_read_lock_sched() <interrupt> __irq_exit_rcu() -> __do_softirq)() -> if (!IS_ENABLED(CONFIG_PREEMPT_RT) && __this_cpu_read(ksoftirqd) == current) -> rcu_softirq_qs() -> RCU_LOCKDEP_WARN(lock_is_held(&rcu_sched_lock_map))
The rcu quiescent state is reported in the rcu-read critical section, so the lockdep warning is triggered.
Fix this by splitting out the inner working of __do_softirq() into a helper function which takes an argument to distinguish between ksoftirqd task context and interrupted context and invoke it from the relevant call sites with the proper context information and use that for the conditional invocation of rcu_softirq_qs().
Reported-by: [email protected] Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Zqiang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/8f281a10-b85a-4586-9586-5bbc12dc784f@paulmck-laptop/T/#mea8aba4abfcb97bbf499d169ce7f30c4cff1b0e3
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
1acd92d9 |
| 27-Feb-2024 |
Tejun Heo <[email protected]> |
workqueue: Drain BH work items on hot-unplugged CPUs
Boqun pointed out that workqueues aren't handling BH work items on offlined CPUs. Unlike tasklet which transfers out the pending tasks from CPUHP
workqueue: Drain BH work items on hot-unplugged CPUs
Boqun pointed out that workqueues aren't handling BH work items on offlined CPUs. Unlike tasklet which transfers out the pending tasks from CPUHP_SOFTIRQ_DEAD, BH workqueue would just leave them pending which is problematic. Note that this behavior is specific to BH workqueues as the non-BH per-CPU workers just become unbound when the CPU goes offline.
This patch fixes the issue by draining the pending BH work items from an offlined CPU from CPUHP_SOFTIRQ_DEAD. Because work items carry more context, it's not as easy to transfer the pending work items from one pool to another. Instead, run BH work items which execute the offlined pools on an online CPU.
Note that this assumes that no further BH work items will be queued on the offlined CPUs. This assumption is shared with tasklet and should be fine for conversions. However, this issue also exists for per-CPU workqueues which will just keep executing work items queued after CPU offline on unbound workers and workqueue should reject per-CPU and BH work items queued on offline CPUs. This will be addressed separately later.
Signed-off-by: Tejun Heo <[email protected]> Reported-and-reviewed-by: Boqun Feng <[email protected]> Link: http://lkml.kernel.org/r/Zdvw0HdSXcU3JZ4g@boqun-archlinux
show more ...
|
|
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4 |
|
| #
4cb1ef64 |
| 04-Feb-2024 |
Tejun Heo <[email protected]> |
workqueue: Implement BH workqueues to eventually replace tasklets
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some desi
workqueue: Implement BH workqueues to eventually replace tasklets
The only generic interface to execute asynchronously in the BH context is tasklet; however, it's marked deprecated and has some design flaws such as the execution code accessing the tasklet item after the execution is complete which can lead to subtle use-after-free in certain usage scenarios and less-developed flush and cancel mechanisms.
This patch implements BH workqueues which share the same semantics and features of regular workqueues but execute their work items in the softirq context. As there is always only one BH execution context per CPU, none of the concurrency management mechanisms applies and a BH workqueue can be thought of as a convenience wrapper around softirq.
Except for the inability to sleep while executing and lack of max_active adjustments, BH workqueues and work items should behave the same as regular workqueues and work items.
Currently, the execution is hooked to tasklet[_hi]. However, the goal is to convert all tasklet users over to BH workqueues. Once the conversion is complete, tasklet can be removed and BH workqueues can directly take over the tasklet softirqs.
system_bh[_highpri]_wq are added. As queue-wide flushing doesn't exist in tasklet, all existing tasklet users should be able to use the system BH workqueues without creating their own workqueues.
v3: - Add missing interrupt.h include.
v2: - Instead of using tasklets, hook directly into its softirq action functions - tasklet[_hi]_action(). This is slightly cheaper and closer to the eventual code structure we want to arrive at. Suggested by Lai.
- Lai also pointed out several places which need NULL worker->task handling or can use clarification. Updated.
Signed-off-by: Tejun Heo <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/CAHk-=wjDW53w4-YcSmgKC5RruiRLHmJ1sXeYdp_ZgVoBw=5byA@mail.gmail.com Tested-by: Allen Pais <[email protected]> Reviewed-by: Lai Jiangshan <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
548796e2 |
| 29-Jun-2023 |
Cruz Zhao <[email protected]> |
sched/core: introduce sched_core_idle_cpu()
As core scheduling introduced, a new state of idle is defined as force idle, running idle task but nr_running greater than zero.
If a cpu is in force idl
sched/core: introduce sched_core_idle_cpu()
As core scheduling introduced, a new state of idle is defined as force idle, running idle task but nr_running greater than zero.
If a cpu is in force idle state, idle_cpu() will return zero. This result makes sense in some scenarios, e.g., load balance, showacpu when dumping, and judge the RCU boost kthread is starving.
But this will cause error in other scenarios, e.g., tick_irq_exit(): When force idle, rq->curr == rq->idle but rq->nr_running > 0, results that idle_cpu() returns 0. In function tick_irq_exit(), if idle_cpu() is 0, tick_nohz_irq_exit() will not be called, and ts->idle_active will not become 1, which became 0 in tick_nohz_irq_enter(). ts->idle_sleeptime won't update in function update_ts_time_stats(), if ts->idle_active is 0, which should be 1. And this bug will result that ts->idle_sleeptime is less than the actual value, and finally will result that the idle time in /proc/stat is less than the actual value.
To solve this problem, we introduce sched_core_idle_cpu(), which returns 1 when force idle. We audit all users of idle_cpu(), and change idle_cpu() into sched_core_idle_cpu() in function tick_irq_exit().
v2-->v3: Only replace idle_cpu() with sched_core_idle_cpu() in function tick_irq_exit(). And modify the corresponding commit log.
Signed-off-by: Cruz Zhao <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Peter Zijlstra <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Joel Fernandes <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2 |
|
| #
d15121be |
| 08-May-2023 |
Paolo Abeni <[email protected]> |
Revert "softirq: Let ksoftirqd do its job"
This reverts the following commits:
4cd13c21b207 ("softirq: Let ksoftirqd do its job") 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") 1342
Revert "softirq: Let ksoftirqd do its job"
This reverts the following commits:
4cd13c21b207 ("softirq: Let ksoftirqd do its job") 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") 1342d8080f61 ("softirq: Don't skip softirq execution when softirq thread is parking")
in a single change to avoid known bad intermediate states introduced by a patch series reverting them individually.
Due to the mentioned commit, when the ksoftirqd threads take charge of softirq processing, the system can experience high latencies.
In the past a few workarounds have been implemented for specific side-effects of the initial ksoftirqd enforcement commit:
commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous")
But the latency problem still exists in real-life workloads, see the link below.
The reverted commit intended to solve a live-lock scenario that can now be addressed with the NAPI threaded mode, introduced with commit 29863d41bb6e ("net: implement threaded-able napi poll loop support"), which is nowadays in a pretty stable status.
While a complete solution to put softirq processing under nice resource control would be preferable, that has proven to be a very hard task. In the short term, remove the main pain point, and also simplify a bit the current softirq implementation.
Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Jason Xing <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/netdev/[email protected] Link: https://lore.kernel.org/r/57e66b364f1b6f09c9bc0316742c3b14f4ce83bd.1683526542.git.pabeni@redhat.com
show more ...
|
|
Revision tags: v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6 |
|
| #
f4bf3ca2 |
| 07-Apr-2023 |
Lingutla Chandrasekhar <[email protected]> |
softirq: Add trace points for tasklet entry/exit
Tasklets are supposed to finish their work quickly and should not block the current running process, but it is not guaranteed that they do so.
Curre
softirq: Add trace points for tasklet entry/exit
Tasklets are supposed to finish their work quickly and should not block the current running process, but it is not guaranteed that they do so.
Currently softirq_entry/exit can be used to analyse the total tasklets execution time, but that's not helpful to track individual tasklets execution time. That makes it hard to identify tasklet functions, which take more time than expected.
Add tasklet_entry/exit trace point support to track individual tasklet execution.
Trivial usage example: # echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_entry/enable # echo 1 > /sys/kernel/debug/tracing/events/irq/tasklet_exit/enable # cat /sys/kernel/debug/tracing/trace # tracer: nop # # entries-in-buffer/entries-written: 4/4 #P:4 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | <idle>-0 [003] ..s1. 314.011428: tasklet_entry: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.011432: tasklet_exit: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.017369: tasklet_entry: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func <idle>-0 [003] ..s1. 314.017371: tasklet_exit: tasklet=0xffffa01ef8db2740 function=tcp_tasklet_func
Signed-off-by: Lingutla Chandrasekhar <[email protected]> Signed-off-by: J. Avila <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
[elavila: Port to android-mainline] [jstultz: Rebased to upstream, cut unused trace points, added comments for the tracepoints, reworded commit]
show more ...
|
|
Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2 |
|
| #
6f0e6c15 |
| 08-Jun-2022 |
Frederic Weisbecker <[email protected]> |
context_tracking: Take IRQ eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the IRQ extended quiescent states entrypoi
context_tracking: Take IRQ eqs entrypoints over RCU
The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the IRQ extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls.
[ paulmck: Apply Stephen Rothwell feedback from -next. ] [ paulmck: Apply Nathan Chancellor feedback. ]
Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Uladzislau Rezki <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Nicolas Saenz Julienne <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Xiongfeng Wang <[email protected]> Cc: Yu Liao <[email protected]> Cc: Phil Auld <[email protected]> Cc: Paul Gortmaker<[email protected]> Cc: Alex Belits <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Nicolas Saenz Julienne <[email protected]> Tested-by: Nicolas Saenz Julienne <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3 |
|
| #
1a90bfd2 |
| 13-Apr-2022 |
Sebastian Andrzej Siewior <[email protected]> |
smp: Make softirq handling RT safe in flush_smp_call_function_queue()
flush_smp_call_function_queue() invokes do_softirq() which is not available on PREEMPT_RT. flush_smp_call_function_queue() is in
smp: Make softirq handling RT safe in flush_smp_call_function_queue()
flush_smp_call_function_queue() invokes do_softirq() which is not available on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle task and the migration task with preemption or interrupts disabled.
So RT kernels cannot process soft interrupts in that context as that has to acquire 'sleeping spinlocks' which is not possible with preemption or interrupts disabled and forbidden from the idle task anyway.
The currently known SMP function call which raises a soft interrupt is in the block layer, but this functionality is not enabled on RT kernels due to latency and performance reasons.
RT could wake up ksoftirqd unconditionally, but this wants to be avoided if there were soft interrupts pending already when this is invoked in the context of the migration task. The migration task might have preempted a threaded interrupt handler which raised a soft interrupt, but did not reach the local_bh_enable() to process it. The "running" ksoftirqd might prevent the handling in the interrupt thread context which is causing latency issues.
Add a new function which handles this case explicitely for RT and falls back to do_softirq() on !RT kernels. In the RT case this warns when one of the flushed SMP function calls raised a soft interrupt so this can be investigated.
[ tglx: Moved the RT part out of SMP code ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2 |
|
| #
fe13889c |
| 28-Jan-2022 |
Changbin Du <[email protected]> |
genirq, softirq: Use in_hardirq() instead of in_irq()
Replace the obsolete and ambiguos macro in_irq() with the new macro in_hardirq().
Signed-off-by: Changbin Du <[email protected]> Signed-off
genirq, softirq: Use in_hardirq() instead of in_irq()
Replace the obsolete and ambiguos macro in_irq() with the new macro in_hardirq().
Signed-off-by: Changbin Du <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15 |
|
| #
53e87e3c |
| 26-Oct-2021 |
Frederic Weisbecker <[email protected]> |
timers/nohz: Last resort update jiffies on nohz_full IRQ entry
When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick.
Mean
timers/nohz: Last resort update jiffies on nohz_full IRQ entry
When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick.
Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine.
If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm.
Here is a scenario where it matters:
0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
1) A stop machine callback is queued to execute somewhere.
2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs.
3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled.
5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock.
6) The tick fires immediately, goto 3)
7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.
Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter.
Reported-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5 |
|
| #
91cc470e |
| 02-Jun-2021 |
Tanner Love <[email protected]> |
genirq: Change force_irqthreads to a static key
With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads could incur a cache line miss in invoke_softirq() and other places.
Replace
genirq: Change force_irqthreads to a static key
With CONFIG_IRQ_FORCED_THREADING=y, testing the boolean force_irqthreads could incur a cache line miss in invoke_softirq() and other places.
Replace the test with a static key to avoid the potential cache miss.
[ tglx: Dropped the IDE part, removed the export and updated blk-mq ]
Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Tanner Love <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
b03fbd4f |
| 11-Jun-2021 |
Peter Zijlstra <[email protected]> |
sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Davidl
sched: Introduce task_is_running()
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper: task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
37aadc68 |
| 11-Jun-2021 |
Peter Zijlstra <[email protected]> |
sched: Unbreak wakeups
Remove broken task->state references and let wake_up_process() DTRT.
The anti-pattern in these patches breaks the ordering of ->state vs COND as described in the comment near
sched: Unbreak wakeups
Remove broken task->state references and let wake_up_process() DTRT.
The anti-pattern in these patches breaks the ordering of ->state vs COND as described in the comment near set_current_state() and can lead to missed wakeups:
(OoO load, observes RUNNING)<-. for (;;) { | t->state = UNINTERRUPTIBLE; | smp_mb(); ,-----> | (observes !COND) | / if (COND) ---------' | COND = 1; break; `- if (t->state != RUNNING) wake_up_process(t); // not done schedule(); // forever waiting } t->state = TASK_RUNNING;
Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Davidlohr Bueso <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3 |
|
| #
47c218dc |
| 09-Mar-2021 |
Thomas Gleixner <[email protected]> |
tick/sched: Prevent false positive softirq pending warnings on RT
On RT a task which has soft interrupts disabled can block on a lock and schedule out to idle while soft interrupts are pending. This
tick/sched: Prevent false positive softirq pending warnings on RT
On RT a task which has soft interrupts disabled can block on a lock and schedule out to idle while soft interrupts are pending. This triggers the warning in the NOHZ idle code which complains about going idle with pending soft interrupts. But as the task is blocked soft interrupt processing is temporarily blocked as well which means that such a warning is a false positive.
To prevent that check the per CPU state which indicates that a scheduled out task has soft interrupts disabled.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
8b1c04ac |
| 09-Mar-2021 |
Thomas Gleixner <[email protected]> |
softirq: Make softirq control and processing RT aware
Provide a local lock based serialization for soft interrupts on RT which allows the local_bh_disabled() sections and servicing soft interrupts t
softirq: Make softirq control and processing RT aware
Provide a local lock based serialization for soft interrupts on RT which allows the local_bh_disabled() sections and servicing soft interrupts to be preemptible.
Provide the necessary inline helpers which allow to reuse the bulk of the softirq processing code.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f02fc963 |
| 09-Mar-2021 |
Thomas Gleixner <[email protected]> |
softirq: Move various protections into inline helpers
To allow reuse of the bulk of softirq processing code for RT and to avoid #ifdeffery all over the place, split protections for various code sect
softirq: Move various protections into inline helpers
To allow reuse of the bulk of softirq processing code for RT and to avoid #ifdeffery all over the place, split protections for various code sections out into inline helpers so the RT variant can just replace them in one go.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
eb2dafbb |
| 09-Mar-2021 |
Thomas Gleixner <[email protected]> |
tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in the tasklet state to be cleared. This works on !RT nicely because
tasklets: Prevent tasklet_unlock_spin_wait() deadlock on RT
tasklet_unlock_spin_wait() spin waits for the TASKLET_STATE_SCHED bit in the tasklet state to be cleared. This works on !RT nicely because the corresponding execution can only happen on a different CPU.
On RT softirq processing is preemptible, therefore a task preempting the softirq processing thread can spin forever.
Prevent this by invoking local_bh_disable()/enable() inside the loop. In case that the softirq processing thread was preempted by the current task, current will block on the local lock which yields the CPU to the preempted softirq processing thread. If the tasklet is processed on a different CPU then the local_bh_disable()/enable() pair is just a waste of processor cycles.
Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
697d8c63 |
| 09-Mar-2021 |
Peter Zijlstra <[email protected]> |
tasklets: Replace spin wait in tasklet_kill()
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking yield() from inside the loop. yield() is an ill defined mechanism and the resul
tasklets: Replace spin wait in tasklet_kill()
tasklet_kill() spin waits for TASKLET_STATE_SCHED to be cleared invoking yield() from inside the loop. yield() is an ill defined mechanism and the result might still be wasting CPU cycles in a tight loop which is especially painful in a guest when the CPU running the tasklet is scheduled out.
tasklet_kill() is used in teardown paths and not performance critical at all. Replace the spin wait with wait_var_event().
Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
da044747 |
| 09-Mar-2021 |
Peter Zijlstra <[email protected]> |
tasklets: Replace spin wait in tasklet_unlock_wait()
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This is wasting CPU cycles in a tight loop which is especially painful in a
tasklets: Replace spin wait in tasklet_unlock_wait()
tasklet_unlock_wait() spin waits for TASKLET_STATE_RUN to be cleared. This is wasting CPU cycles in a tight loop which is especially painful in a guest when the CPU running the tasklet is scheduled out.
tasklet_unlock_wait() is invoked from tasklet_kill() which is used in teardown paths and not performance critical at all. Replace the spin wait with wait_var_event().
There are no users of tasklet_unlock_wait() which are invoked from atomic contexts. The usage in tasklet_disable() has been replaced temporarily with the spin waiting variant until the atomic users are fixed up and will be converted to the sleep wait variant later.
Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
6b2c339d |
| 17-Mar-2021 |
Dirk Behme <[email protected]> |
softirq: s/BUG/WARN_ONCE/ on tasklet SCHED state not set
Replace BUG() with WARN_ONCE() on wrong tasklet state, in order to:
- increase the verbosity / aid in debugging - avoid fatal/unrecoverabl
softirq: s/BUG/WARN_ONCE/ on tasklet SCHED state not set
Replace BUG() with WARN_ONCE() on wrong tasklet state, in order to:
- increase the verbosity / aid in debugging - avoid fatal/unrecoverable state
Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Dirk Behme <[email protected]> Signed-off-by: Eugeniu Rosca <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|