|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
a3e81621 |
| 28-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
This commit renames the rcu_report_exp_cpu_mult() function from "mask" to "mask_in" and introduced a "mask" local variable to
rcu: Split rcu_report_exp_cpu_mult() mask parameter and use for tracing
This commit renames the rcu_report_exp_cpu_mult() function from "mask" to "mask_in" and introduced a "mask" local variable to better support upcoming event-tracing additions.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc5 |
|
| #
1bb03ad3 |
| 26-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
Callers to rcu_exp_need_qs() are supposed to disable interrupts, so this commit enlists lockdep's aid in checking this.
Signed-off-by: P
rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
Callers to rcu_exp_need_qs() are supposed to disable interrupts, so this commit enlists lockdep's aid in checking this.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
ecc5e6b0 |
| 26-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding CPU, and that CPU is not even allowed to race with itse
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding CPU, and that CPU is not even allowed to race with itself, for example, via interrupt handlers. This commit therefore adds KCSAN exclusive-writer assertions to check this constraint.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
7a323371 |
| 26-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Make preemptible rcu_exp_handler() check idempotency
Although the non-preemptible implementation of rcu_exp_handler() contains checks to enforce idempotency, the preemptible version does not. T
rcu: Make preemptible rcu_exp_handler() check idempotency
Although the non-preemptible implementation of rcu_exp_handler() contains checks to enforce idempotency, the preemptible version does not. The reason for this omission is that in preemptible kernels, there is no reporting of quiescent states from CPU hotplug notifiers, and thus no need for idempotency.
In theory, anyway.
In practice, accidents happen. This commit therefore adds checks under WARN_ON_ONCE() to catch any such accidents.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
6ae4c30f |
| 25-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
Currently, the preemptible implementation of rcu_exp_handler() almost open-codes rcu_exp_need_qs(). A call to that functio
rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
Currently, the preemptible implementation of rcu_exp_handler() almost open-codes rcu_exp_need_qs(). A call to that function would be shorter and would improve expediting in cases where rcu_exp_handler() interrupted a preemption-disabled or bh-disabled region of code. This commit therefore moves rcu_exp_need_qs() out of the non-preemptible leg of the enclosing #ifdef and replaces the open coding in preemptible rcu_exp_handler() with a call to rcu_exp_need_qs().
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
e2bd1682 |
| 25-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
This commit reduces the state space of rcu_report_exp_rdp() by moving the setting of ->cpu_no_qs.b.exp under the rcu_node struc
rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
This commit reduces the state space of rcu_report_exp_rdp() by moving the setting of ->cpu_no_qs.b.exp under the rcu_node structure's ->lock. The lock isn't really all that important here, given that this per-CPU field is supposed to be written only by its CPU, but the disabling of interrupts excludes things like rcu_exp_handler(), which also can write to this same field. Avoiding this sort of interleaved access reduces the state space.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
d16e32f7 |
| 25-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
There is a hard-to-trigger bug in the expedited grace-period computation whose fix requires that the __sync_rcu_exp_select_node_cpus() functio
rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
There is a hard-to-trigger bug in the expedited grace-period computation whose fix requires that the __sync_rcu_exp_select_node_cpus() function to check that the grace-period sequence number has not changed before invoking rcu_report_exp_cpu_mult(). However, this check must be done while holding the leaf rcu_node structure's ->lock.
This commit therefore prepares for that fix by moving this lock's acquisition from rcu_report_exp_cpu_mult() to its callers (all two of them).
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
8c03273a |
| 20-Aug-2024 |
John Ogness <[email protected]> |
rcu: Mark emergency sections in rcu stalls
Mark emergency sections wherever multiple lines of rcu stall information are generated. In an emergency section, every printk() call will attempt to direct
rcu: Mark emergency sections in rcu stalls
Mark emergency sections wherever multiple lines of rcu stall information are generated. In an emergency section, every printk() call will attempt to directly flush to the consoles using the EMERGENCY priority.
Signed-off-by: John Ogness <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
76ce2b3d |
| 29-Apr-2024 |
Valentin Schneider <[email protected]> |
rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the snapshot helpers are now prefix b
rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, and the snapshot helpers are now prefix by "rcu_watching". Reflect that change into the storage variables for these snapshots.
Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5 |
|
| #
3116a32e |
| 16-Apr-2024 |
Valentin Schneider <[email protected]> |
rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, the dynticks prefix can go.
Whil
rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, the dynticks prefix can go.
While at it, this helper is only meant to be called after failing an earlier call to rcu_watching_snap_in_eqs(), document this in the comments and add a WARN_ON_ONCE() for good measure.
Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
9629936d |
| 16-Apr-2024 |
Valentin Schneider <[email protected]> |
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
W
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
While at it, update a comment that still refers to rcu_dynticks_snap(), which was removed by commit:
7be2e6323b9b ("rcu: Remove full memory barrier on RCU stall printout")
Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
51b73999 |
| 28-Jun-2024 |
Ryo Takakura <[email protected]> |
rcu: Let dump_cpu_task() be used without preemption disabled
The commit 2d7f00b2f0130 ("rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()") disabled preemption around du
rcu: Let dump_cpu_task() be used without preemption disabled
The commit 2d7f00b2f0130 ("rcu: Suppress smp_processor_id() complaint in synchronize_rcu_expedited_wait()") disabled preemption around dump_cpu_task() to suppress warning on its usage within preemtible context.
Calling dump_cpu_task() doesn't required to be in non-preemptible context except for suppressing the smp_processor_id() warning. As the smp_processor_id() is evaluated along with in_hardirq() to check if it's in interrupt context, this patch removes the need for its preemtion disablement by reordering the condition so that smp_processor_id() only gets evaluated when it's in interrupt context.
Signed-off-by: Ryo Takakura <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
7c72dedb |
| 02-Jul-2024 |
Paul E. McKenney <[email protected]> |
rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by expedited RCU CPU stall warnings is usually redundant, flooding th
rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stalls
During CSD-lock stalls, the additional information output by expedited RCU CPU stall warnings is usually redundant, flooding the console for not good reason. However, this has been the way things work for a few years. This commit therefore uses rcutree.csd_lock_suppress_rcu_stall kernel boot parameter that causes expedited RCU CPU stall warnings to be abbreviated to a single line when there is at least one CPU that has been stuck waiting for CSD lock for more than five seconds.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
27d5749d |
| 02-Jul-2024 |
Paul E. McKenney <[email protected]> |
rcu: Extract synchronize_rcu_expedited_stall() from synchronize_rcu_expedited_wait()
This commit extracts the RCU CPU stall-warning report code from synchronize_rcu_expedited_wait() and places it in
rcu: Extract synchronize_rcu_expedited_stall() from synchronize_rcu_expedited_wait()
This commit extracts the RCU CPU stall-warning report code from synchronize_rcu_expedited_wait() and places it in a new function named synchronize_rcu_expedited_stall(). This is strictly a code-movement commit. A later commit will use this reorganization to avoid printing expedited RCU CPU stall warnings while there are ongoing CSD-lock stall reports.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
125716c3 |
| 16-Apr-2024 |
Valentin Schneider <[email protected]> |
context_tracking, rcu: Rename ct_dynticks_cpu_acquire() into ct_rcu_watching_cpu_acquire()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change i
context_tracking, rcu: Rename ct_dynticks_cpu_acquire() into ct_rcu_watching_cpu_acquire()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, reflect that change in the related helpers.
Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
677ab23b |
| 15-May-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Remove redundant full memory barrier at the end of GP
A full memory barrier is necessary at the end of the expedited grace period to order:
1) The grace period completion (pictured by the
rcu/exp: Remove redundant full memory barrier at the end of GP
A full memory barrier is necessary at the end of the expedited grace period to order:
1) The grace period completion (pictured by the GP sequence number) with all preceding accesses. This pairs with rcu_seq_end() performed by the concurrent kworker.
2) The grace period completion and subsequent post-GP update side accesses. Pairs again against rcu_seq_end().
This full barrier is already provided by the final sync_exp_work_done() test, making the subsequent explicit one redundant. Remove it and improve comments.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
33c0860b |
| 27-Jun-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that
rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot
When the grace period kthread checks the extended quiescent state counter of a CPU, full ordering is necessary to ensure that either:
* If the GP kthread observes the remote target in an extended quiescent state, then that target must observe all accesses prior to the current grace period, including the current grace period sequence number, once it exits that extended quiescent state.
or:
* If the GP kthread observes the remote target NOT in an extended quiescent state, then the target further entering in an extended quiescent state must observe all accesses prior to the current grace period, including the current grace period sequence number, once it enters that extended quiescent state.
This ordering is enforced through a full memory barrier placed right before taking the first EQS snapshot. However this is superfluous because the snapshot is taken while holding the target's rnp lock which provides the necessary ordering through its chain of smp_mb__after_unlock_lock().
Remove the needless explicit barrier before the snapshot and put a comment about the implicit barrier newly relied upon here.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Boqun Feng <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8 |
|
| #
988f569a |
| 08-Mar-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency point of view. Workloads which depend on this can benefit of it.
The delay of wakeme_after_rcu()
rcu: Reduce synchronize_rcu() latency
A call to a synchronize_rcu() can be optimized from a latency point of view. Workloads which depend on this can benefit of it.
The delay of wakeme_after_rcu() callback, which unblocks a waiter, depends on several factors:
- how fast a process of offloading is started. Combination of: - !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU; - !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY; - other. - when started, invoking path is interrupted due to: - time limit; - need_resched(); - if limit is reached. - where in a nocb list it is located; - how fast previous callbacks completed;
Example:
1. On our embedded devices i can easily trigger the scenario when it is a last in the list out of ~3600 callbacks:
<snip> <...>-29 [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28 ... <...>-29 [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt <...>-29 [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt <...>-29 [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt <...>-29 [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt <...>-29 [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt <...>-29 [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt <...>-29 [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=.... <snip>
2. We use cpuset/cgroup to classify tasks and assign them into different cgroups. For example "backgrond" group which binds tasks only to little CPUs or "foreground" which makes use of all CPUs. Tasks can be migrated between groups by a request if an acceleration is needed.
See below an example how "surfaceflinger" task gets migrated. Initially it is located in the "system-background" cgroup which allows to run only on little cores. In order to speed it up it can be temporary moved into "foreground" cgroup which allows to use big/all CPUs:
cgroup_attach_task(): -> cgroup_migrate_execute() -> cpuset_can_attach() -> percpu_down_write() -> rcu_sync_enter() -> synchronize_rcu() -> now move tasks to the new cgroup. -> cgroup_migrate_finish()
<snip> rcuop/1-29 [000] ..... 7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt PERFD-SERVER-1855 [000] d..1. 7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger TimerDispatch-2768 [002] d..5. 7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4 <snip>
"Boosting a task" depends on synchronize_rcu() latency:
- first trace shows a completion of synchronize_rcu(); - second shows attaching a task to a new group; - last shows a final step when migration occurs.
3. To address this drawback, maintain a separate track that consists of synchronize_rcu() callers only. After completion of a grace period users are deferred to a dedicated worker to process requests.
4. This patch reduces the latency of synchronize_rcu() approximately by ~30-40% on synthetic tests. The real test case, camera launch time, shows(time is in milliseconds):
1-run 542 vs 489 improvement 9% 2-run 540 vs 466 improvement 13% 3-run 518 vs 468 improvement 9% 4-run 531 vs 457 improvement 13% 5-run 548 vs 475 improvement 13% 6-run 509 vs 484 improvement 4%
Synthetic test(no "noise" from other callbacks): Hardware: x86_64 64 CPUs, 64GB of memory Linux-6.6
- 10K tasks(simultaneous); - each task does(1000 loops) synchronize_rcu(); kfree(p);
default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users; patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.
Running 60K gives approximately same results on my setup. Please note it is without any interaction with another type of callbacks, otherwise it will impact a lot a default case.
5. By default it is disabled. To enable this perform one of the below sequence:
echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"
Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Co-developed-by: Neeraj Upadhyay (AMD) <[email protected]> Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
| #
23da2ad6 |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Remove rcu_par_gp_wq
TREE04 running on short iterations can produce writer stalls of the following kind:
??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0 task:rcu_tor
rcu/exp: Remove rcu_par_gp_wq
TREE04 running on short iterations can produce writer stalls of the following kind:
??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0 task:rcu_torture_wri state:D stack:14568 pid:83 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2de/0x850 ? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0 schedule+0x4f/0x90 synchronize_rcu_expedited+0x430/0x670 ? __pfx_autoremove_wake_function+0x10/0x10 ? __pfx_synchronize_rcu_expedited+0x10/0x10 do_rtws_sync.constprop.0+0xde/0x230 rcu_torture_writer+0x4b4/0xcd0 ? __pfx_rcu_torture_writer+0x10/0x10 kthread+0xc7/0xf0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
Waiting for an expedited grace period and polling for an expedited grace period both are operations that internally rely on the same workqueue performing necessary asynchronous work.
However, a dependency chain is involved between those two operations, as depicted below:
====== CPU 0 ======= ====== CPU 1 =======
synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); start_poll_synchronize_rcu_expedited queue_work(rcu_gp_wq, &rnp->exp_poll_wq); synchronize_rcu_expedited_queue_work() queue_work(rcu_gp_wq, &rew->rew_work); wait_event() // A, wait for &rew->rew_work completion mutex_unlock() // B //======> switch to kworker
sync_rcu_do_polled_gp() { synchronize_rcu_expedited() exp_funnel_lock() mutex_lock(&rcu_state.exp_mutex); // C, wait B .... } // D
Since workqueues are usually implemented on top of several kworkers handling the queue concurrently, the above situation wouldn't deadlock most of the time because A then doesn't depend on D. But in case of memory stress, a single kworker may end up handling alone all the works in a serialized way. In that case the above layout becomes a problem because A then waits for D, closing a circular dependency:
A -> D -> C -> B -> A
This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed synchronize_rcu_expedited() is otherwise implemented on top of a kthread worker while polling still relies on rcu_gp_wq workqueue, breaking the above circular dependency chain.
Fix this with making expedited grace period to always rely on kthread worker. The workqueue based implementation is essentially a duplicate anyway now that the per-node initialization is performed by per-node kthread workers.
Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to manage the scheduler policy of these kthread workers.
Reported-by: Anna-Maria Behnsen <[email protected]> Reported-by: Thomas Gleixner <[email protected]> Suggested-by: Joel Fernandes <[email protected]> Suggested-by: Paul E. McKenney <[email protected]> Suggested-by: Neeraj upadhyay <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
8e5e6215 |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Make parallel exp gp kworker per rcu node
When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node).
How
rcu/exp: Make parallel exp gp kworker per rcu node
When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node).
However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is performed by a single kworker serializing each node initialization (one work for all nodes).
The second part is certainly less scalable and efficient beyond a single leaf node.
To improve this, expand this single kworker into per-node kworkers. This new layout is eventually intended to remove the workqueues based implementation since it will essentially now become duplicate code.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
e7539ffc |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Handle RCU expedited grace period kworker allocation failure
Just like is done for the kworker performing nodes initialization, gracefully handle the possible allocation failure of the RCU
rcu/exp: Handle RCU expedited grace period kworker allocation failure
Just like is done for the kworker performing nodes initialization, gracefully handle the possible allocation failure of the RCU expedited grace period main kworker.
While at it perform a rename of the related checking functions to better reflect the expedited specifics.
Reviewed-by: Kalesh Singh <[email protected]> Fixes: 9621fbee44df ("rcu: Move expedited grace period (GP) work to RT kthread_worker") Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7 |
|
| #
a7e4074d |
| 18-Dec-2023 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Remove full barrier upon main thread wakeup
When an expedited grace period is ending, care must be taken so that all the quiescent states propagated up to the root are correctly ordered aga
rcu/exp: Remove full barrier upon main thread wakeup
When an expedited grace period is ending, care must be taken so that all the quiescent states propagated up to the root are correctly ordered against the wake up of the main expedited grace period workqueue.
This ordering is already carried through the root rnp locking augmented by an smp_mb__after_unlock_lock() barrier.
Therefore the explicit smp_mb() placed before the wake up is not needed and can be removed.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
e787644c |
| 18-Dec-2023 |
Frederic Weisbecker <[email protected]> |
rcu: Defer RCU kthreads wakeup when CPU is dying
When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiesc
rcu: Defer RCU kthreads wakeup when CPU is dying
When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiescent state propagates up to the top, some tasks may then be woken up to complete the grace period: the main grace period kthread and/or the expedited main workqueue (or kworker).
If those kthreads have a SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Since this happens after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the timer gets ignored. Therefore if the RCU kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled.
This triggers TREE03 rcutorture hangs:
rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2eb/0xa80 schedule+0x1f/0x90 schedule_timeout+0x163/0x270 ? __pfx_process_timeout+0x10/0x10 rcu_gp_fqs_loop+0x37c/0x5b0 ? __pfx_rcu_gp_kthread+0x10/0x10 rcu_gp_kthread+0x17c/0x200 kthread+0xde/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The situation can't be solved with just unpinning the timer. The hrtimer infrastructure and the nohz heuristics involved in finding the best remote target for an unpinned timer would then also need to handle enqueues from an offline CPU in the most horrendous way.
So fix this on the RCU side instead and defer the wake up to an online CPU if it's too late for the local one.
Reported-by: Paul E. McKenney <[email protected]> Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>
show more ...
|
|
Revision tags: v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7 |
|
| #
5b404fda |
| 15-Aug-2023 |
Paul E. McKenney <[email protected]> |
rcu: Add RCU CPU stall notifier
It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain
rcu: Add RCU CPU stall notifier
It is sometimes helpful to have a way for the subsystem causing the stall to dump its state when an RCU CPU stall occurs. This commit therefore bases rcu_stall_chain_notifier_register() and rcu_stall_chain_notifier_unregister() on atomic notifiers in order to provide this functionality.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Steven Rostedt <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
|
Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6 |
|
| #
9146eb25 |
| 07-Apr-2023 |
Paul E. McKenney <[email protected]> |
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read more widely. Unmarked accesses are OK from the corresponding CPU, but only if interrupts are disabled, given that interrupt handlers can and do modify this field.
Unfortunately, although the load from rcu_preempt_deferred_qs() is always carried out from the corresponding CPU, interrupts are not necessarily disabled. This commit therefore upgrades this load to READ_ONCE.
Similarly, the diagnostic access from synchronize_rcu_expedited_wait() might run with interrupts disabled and from some other CPU. This commit therefore marks this load with data_race().
Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as is because interrupts are disabled and this load is always from the corresponding CPU. This commit adds a comment giving the rationale for this access being safe.
This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely.
Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|