History log of /linux-6.15/kernel/rcu/tree_exp.h (Results 1 – 25 of 140)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6
# a3e81621 28-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Split rcu_report_exp_cpu_mult() mask parameter and use for tracing

This commit renames the rcu_report_exp_cpu_mult() function from "mask"
to "mask_in" and introduced a "mask" local variable to

rcu: Split rcu_report_exp_cpu_mult() mask parameter and use for tracing

This commit renames the rcu_report_exp_cpu_mult() function from "mask"
to "mask_in" and introduced a "mask" local variable to better support
upcoming event-tracing additions.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


Revision tags: v6.12-rc5
# 1bb03ad3 26-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()

Callers to rcu_exp_need_qs() are supposed to disable interrupts, so this
commit enlists lockdep's aid in checking this.

Signed-off-by: P

rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()

Callers to rcu_exp_need_qs() are supposed to disable interrupts, so this
commit enlists lockdep's aid in checking this.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


# ecc5e6b0 26-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp

The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding
CPU, and that CPU is not even allowed to race with itse

rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp

The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding
CPU, and that CPU is not even allowed to race with itself, for example,
via interrupt handlers. This commit therefore adds KCSAN exclusive-writer
assertions to check this constraint.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


# 7a323371 26-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Make preemptible rcu_exp_handler() check idempotency

Although the non-preemptible implementation of rcu_exp_handler()
contains checks to enforce idempotency, the preemptible version does not.
T

rcu: Make preemptible rcu_exp_handler() check idempotency

Although the non-preemptible implementation of rcu_exp_handler()
contains checks to enforce idempotency, the preemptible version does not.
The reason for this omission is that in preemptible kernels, there is
no reporting of quiescent states from CPU hotplug notifiers, and thus
no need for idempotency.

In theory, anyway.

In practice, accidents happen. This commit therefore adds checks under
WARN_ON_ONCE() to catch any such accidents.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


# 6ae4c30f 25-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call

Currently, the preemptible implementation of rcu_exp_handler()
almost open-codes rcu_exp_need_qs(). A call to that functio

rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call

Currently, the preemptible implementation of rcu_exp_handler()
almost open-codes rcu_exp_need_qs(). A call to that function would be
shorter and would improve expediting in cases where rcu_exp_handler()
interrupted a preemption-disabled or bh-disabled region of code.
This commit therefore moves rcu_exp_need_qs() out of the non-preemptible
leg of the enclosing #ifdef and replaces the open coding in preemptible
rcu_exp_handler() with a call to rcu_exp_need_qs().

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


# e2bd1682 25-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock

This commit reduces the state space of rcu_report_exp_rdp() by moving
the setting of ->cpu_no_qs.b.exp under the rcu_node struc

rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock

This commit reduces the state space of rcu_report_exp_rdp() by moving
the setting of ->cpu_no_qs.b.exp under the rcu_node structure's ->lock.
The lock isn't really all that important here, given that this per-CPU
field is supposed to be written only by its CPU, but the disabling of
interrupts excludes things like rcu_exp_handler(), which also can write
to this same field. Avoiding this sort of interleaved access reduces
the state space.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


# d16e32f7 25-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Make rcu_report_exp_cpu_mult() caller acquire lock

There is a hard-to-trigger bug in the expedited grace-period computation
whose fix requires that the __sync_rcu_exp_select_node_cpus() functio

rcu: Make rcu_report_exp_cpu_mult() caller acquire lock

There is a hard-to-trigger bug in the expedited grace-period computation
whose fix requires that the __sync_rcu_exp_select_node_cpus() function
to check that the grace-period sequence number has not changed before
invoking rcu_report_exp_cpu_mult(). However, this check must be done
while holding the leaf rcu_node structure's ->lock.

This commit therefore prepares for that fix by moving this lock's
acquisition from rcu_report_exp_cpu_mult() to its callers (all two
of them).

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# 8c03273a 20-Aug-2024 John Ogness <[email protected]>

rcu: Mark emergency sections in rcu stalls

Mark emergency sections wherever multiple lines of
rcu stall information are generated. In an emergency
section, every printk() call will attempt to direct

rcu: Mark emergency sections in rcu stalls

Mark emergency sections wherever multiple lines of
rcu stall information are generated. In an emergency
section, every printk() call will attempt to directly
flush to the consoles using the EMERGENCY priority.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7
# 76ce2b3d 29-Apr-2024 Valentin Schneider <[email protected]>

rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the snapshot helpers are now prefix b

rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, and the snapshot helpers are now prefix by
"rcu_watching". Reflect that change into the storage variables for these
snapshots.

Signed-off-by: Valentin Schneider <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


Revision tags: v6.9-rc6, v6.9-rc5
# 3116a32e 16-Apr-2024 Valentin Schneider <[email protected]>

rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, the dynticks prefix can go.

Whil

rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, the dynticks prefix can go.

While at it, this helper is only meant to be called after failing an
earlier call to rcu_watching_snap_in_eqs(), document this in the comments
and add a WARN_ON_ONCE() for good measure.

Signed-off-by: Valentin Schneider <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 9629936d 16-Apr-2024 Valentin Schneider <[email protected]>

rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.

W

rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.

While at it, update a comment that still refers to rcu_dynticks_snap(),
which was removed by commit:

7be2e6323b9b ("rcu: Remove full memory barrier on RCU stall printout")

Signed-off-by: Valentin Schneider <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 51b73999 28-Jun-2024 Ryo Takakura <[email protected]>

rcu: Let dump_cpu_task() be used without preemption disabled

The commit 2d7f00b2f0130 ("rcu: Suppress smp_processor_id() complaint
in synchronize_rcu_expedited_wait()") disabled preemption around
du

rcu: Let dump_cpu_task() be used without preemption disabled

The commit 2d7f00b2f0130 ("rcu: Suppress smp_processor_id() complaint
in synchronize_rcu_expedited_wait()") disabled preemption around
dump_cpu_task() to suppress warning on its usage within preemtible context.

Calling dump_cpu_task() doesn't required to be in non-preemptible context
except for suppressing the smp_processor_id() warning.
As the smp_processor_id() is evaluated along with in_hardirq()
to check if it's in interrupt context, this patch removes the need
for its preemtion disablement by reordering the condition so that
smp_processor_id() only gets evaluated when it's in interrupt context.

Signed-off-by: Ryo Takakura <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 7c72dedb 02-Jul-2024 Paul E. McKenney <[email protected]>

rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stalls

During CSD-lock stalls, the additional information output by expedited
RCU CPU stall warnings is usually redundant, flooding th

rcu: Summarize expedited RCU CPU stall warnings during CSD-lock stalls

During CSD-lock stalls, the additional information output by expedited
RCU CPU stall warnings is usually redundant, flooding the console for
not good reason. However, this has been the way things work for a few
years. This commit therefore uses rcutree.csd_lock_suppress_rcu_stall
kernel boot parameter that causes expedited RCU CPU stall warnings to
be abbreviated to a single line when there is at least one CPU that has
been stuck waiting for CSD lock for more than five seconds.

Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 27d5749d 02-Jul-2024 Paul E. McKenney <[email protected]>

rcu: Extract synchronize_rcu_expedited_stall() from synchronize_rcu_expedited_wait()

This commit extracts the RCU CPU stall-warning report code from
synchronize_rcu_expedited_wait() and places it in

rcu: Extract synchronize_rcu_expedited_stall() from synchronize_rcu_expedited_wait()

This commit extracts the RCU CPU stall-warning report code from
synchronize_rcu_expedited_wait() and places it in a new function named
synchronize_rcu_expedited_stall(). This is strictly a code-movement
commit. A later commit will use this reorganization to avoid printing
expedited RCU CPU stall warnings while there are ongoing CSD-lock stall
reports.

Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 125716c3 16-Apr-2024 Valentin Schneider <[email protected]>

context_tracking, rcu: Rename ct_dynticks_cpu_acquire() into ct_rcu_watching_cpu_acquire()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change i

context_tracking, rcu: Rename ct_dynticks_cpu_acquire() into ct_rcu_watching_cpu_acquire()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, reflect that change in the related helpers.

Signed-off-by: Valentin Schneider <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 677ab23b 15-May-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Remove redundant full memory barrier at the end of GP

A full memory barrier is necessary at the end of the expedited grace
period to order:

1) The grace period completion (pictured by the

rcu/exp: Remove redundant full memory barrier at the end of GP

A full memory barrier is necessary at the end of the expedited grace
period to order:

1) The grace period completion (pictured by the GP sequence
number) with all preceding accesses. This pairs with rcu_seq_end()
performed by the concurrent kworker.

2) The grace period completion and subsequent post-GP update side
accesses. Pairs again against rcu_seq_end().

This full barrier is already provided by the final sync_exp_work_done()
test, making the subsequent explicit one redundant. Remove it and
improve comments.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>

show more ...


# 33c0860b 27-Jun-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot

When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that

rcu/exp: Remove superfluous full memory barrier upon first EQS snapshot

When the grace period kthread checks the extended quiescent state
counter of a CPU, full ordering is necessary to ensure that either:

* If the GP kthread observes the remote target in an extended quiescent
state, then that target must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it exits that extended quiescent state.

or:

* If the GP kthread observes the remote target NOT in an extended
quiescent state, then the target further entering in an extended
quiescent state must observe all accesses prior to the current
grace period, including the current grace period sequence number, once
it enters that extended quiescent state.

This ordering is enforced through a full memory barrier placed right
before taking the first EQS snapshot. However this is superfluous
because the snapshot is taken while holding the target's rnp lock which
provides the necessary ordering through its chain of
smp_mb__after_unlock_lock().

Remove the needless explicit barrier before the snapshot and put a
comment about the implicit barrier newly relied upon here.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Boqun Feng <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8
# 988f569a 08-Mar-2024 Uladzislau Rezki (Sony) <[email protected]>

rcu: Reduce synchronize_rcu() latency

A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.

The delay of wakeme_after_rcu()

rcu: Reduce synchronize_rcu() latency

A call to a synchronize_rcu() can be optimized from a latency
point of view. Workloads which depend on this can benefit of it.

The delay of wakeme_after_rcu() callback, which unblocks a waiter,
depends on several factors:

- how fast a process of offloading is started. Combination of:
- !CONFIG_RCU_NOCB_CPU/CONFIG_RCU_NOCB_CPU;
- !CONFIG_RCU_LAZY/CONFIG_RCU_LAZY;
- other.
- when started, invoking path is interrupted due to:
- time limit;
- need_resched();
- if limit is reached.
- where in a nocb list it is located;
- how fast previous callbacks completed;

Example:

1. On our embedded devices i can easily trigger the scenario when
it is a last in the list out of ~3600 callbacks:

<snip>
<...>-29 [001] d..1. 21950.145313: rcu_batch_start: rcu_preempt CBs=3613 bl=28
...
<...>-29 [001] ..... 21950.152578: rcu_invoke_callback: rcu_preempt rhp=00000000b2d6dee8 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152579: rcu_invoke_callback: rcu_preempt rhp=00000000a446f607 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152580: rcu_invoke_callback: rcu_preempt rhp=00000000a5cab03b func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152581: rcu_invoke_callback: rcu_preempt rhp=0000000013b7e5ee func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152582: rcu_invoke_callback: rcu_preempt rhp=000000000a8ca6f9 func=__free_vm_area_struct.cfi_jt
<...>-29 [001] ..... 21950.152583: rcu_invoke_callback: rcu_preempt rhp=000000008f162ca8 func=wakeme_after_rcu.cfi_jt
<...>-29 [001] d..1. 21950.152625: rcu_batch_end: rcu_preempt CBs-invoked=3612 idle=....
<snip>

2. We use cpuset/cgroup to classify tasks and assign them into
different cgroups. For example "backgrond" group which binds tasks
only to little CPUs or "foreground" which makes use of all CPUs.
Tasks can be migrated between groups by a request if an acceleration
is needed.

See below an example how "surfaceflinger" task gets migrated.
Initially it is located in the "system-background" cgroup which
allows to run only on little cores. In order to speed it up it
can be temporary moved into "foreground" cgroup which allows
to use big/all CPUs:

cgroup_attach_task():
-> cgroup_migrate_execute()
-> cpuset_can_attach()
-> percpu_down_write()
-> rcu_sync_enter()
-> synchronize_rcu()
-> now move tasks to the new cgroup.
-> cgroup_migrate_finish()

<snip>
rcuop/1-29 [000] ..... 7030.528570: rcu_invoke_callback: rcu_preempt rhp=00000000461605e0 func=wakeme_after_rcu.cfi_jt
PERFD-SERVER-1855 [000] d..1. 7030.530293: cgroup_attach_task: dst_root=3 dst_id=22 dst_level=1 dst_path=/foreground pid=1900 comm=surfaceflinger
TimerDispatch-2768 [002] d..5. 7030.537542: sched_migrate_task: comm=surfaceflinger pid=1900 prio=98 orig_cpu=0 dest_cpu=4
<snip>

"Boosting a task" depends on synchronize_rcu() latency:

- first trace shows a completion of synchronize_rcu();
- second shows attaching a task to a new group;
- last shows a final step when migration occurs.

3. To address this drawback, maintain a separate track that consists
of synchronize_rcu() callers only. After completion of a grace period
users are deferred to a dedicated worker to process requests.

4. This patch reduces the latency of synchronize_rcu() approximately
by ~30-40% on synthetic tests. The real test case, camera launch time,
shows(time is in milliseconds):

1-run 542 vs 489 improvement 9%
2-run 540 vs 466 improvement 13%
3-run 518 vs 468 improvement 9%
4-run 531 vs 457 improvement 13%
5-run 548 vs 475 improvement 13%
6-run 509 vs 484 improvement 4%

Synthetic test(no "noise" from other callbacks):
Hardware: x86_64 64 CPUs, 64GB of memory
Linux-6.6

- 10K tasks(simultaneous);
- each task does(1000 loops)
synchronize_rcu();
kfree(p);

default: CONFIG_RCU_NOCB_CPU: takes 54 seconds to complete all users;
patch: CONFIG_RCU_NOCB_CPU: takes 35 seconds to complete all users.

Running 60K gives approximately same results on my setup. Please note
it is without any interaction with another type of callbacks, otherwise
it will impact a lot a default case.

5. By default it is disabled. To enable this perform one of the
below sequence:

echo 1 > /sys/module/rcutree/parameters/rcu_normal_wake_from_gp
or pass a boot parameter "rcutree.rcu_normal_wake_from_gp=1"

Reviewed-by: Paul E. McKenney <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Co-developed-by: Neeraj Upadhyay (AMD) <[email protected]>
Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1
# 23da2ad6 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Remove rcu_par_gp_wq

TREE04 running on short iterations can produce writer stalls of the
following kind:

??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0
task:rcu_tor

rcu/exp: Remove rcu_par_gp_wq

TREE04 running on short iterations can produce writer stalls of the
following kind:

??? Writer stall state RTWS_EXP_SYNC(4) g3968 f0x0 ->state 0x2 cpu 0
task:rcu_torture_wri state:D stack:14568 pid:83 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x2de/0x850
? trace_event_raw_event_rcu_exp_funnel_lock+0x6d/0xb0
schedule+0x4f/0x90
synchronize_rcu_expedited+0x430/0x670
? __pfx_autoremove_wake_function+0x10/0x10
? __pfx_synchronize_rcu_expedited+0x10/0x10
do_rtws_sync.constprop.0+0xde/0x230
rcu_torture_writer+0x4b4/0xcd0
? __pfx_rcu_torture_writer+0x10/0x10
kthread+0xc7/0xf0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2f/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>

Waiting for an expedited grace period and polling for an expedited
grace period both are operations that internally rely on the same
workqueue performing necessary asynchronous work.

However, a dependency chain is involved between those two operations,
as depicted below:

====== CPU 0 ======= ====== CPU 1 =======

synchronize_rcu_expedited()
exp_funnel_lock()
mutex_lock(&rcu_state.exp_mutex);
start_poll_synchronize_rcu_expedited
queue_work(rcu_gp_wq, &rnp->exp_poll_wq);
synchronize_rcu_expedited_queue_work()
queue_work(rcu_gp_wq, &rew->rew_work);
wait_event() // A, wait for &rew->rew_work completion
mutex_unlock() // B
//======> switch to kworker

sync_rcu_do_polled_gp() {
synchronize_rcu_expedited()
exp_funnel_lock()
mutex_lock(&rcu_state.exp_mutex); // C, wait B
....
} // D

Since workqueues are usually implemented on top of several kworkers
handling the queue concurrently, the above situation wouldn't deadlock
most of the time because A then doesn't depend on D. But in case of
memory stress, a single kworker may end up handling alone all the works
in a serialized way. In that case the above layout becomes a problem
because A then waits for D, closing a circular dependency:

A -> D -> C -> B -> A

This however only happens when CONFIG_RCU_EXP_KTHREAD=n. Indeed
synchronize_rcu_expedited() is otherwise implemented on top of a kthread
worker while polling still relies on rcu_gp_wq workqueue, breaking the
above circular dependency chain.

Fix this with making expedited grace period to always rely on kthread
worker. The workqueue based implementation is essentially a duplicate
anyway now that the per-node initialization is performed by per-node
kthread workers.

Meanwhile the CONFIG_RCU_EXP_KTHREAD switch is still kept around to
manage the scheduler policy of these kthread workers.

Reported-by: Anna-Maria Behnsen <[email protected]>
Reported-by: Thomas Gleixner <[email protected]>
Suggested-by: Joel Fernandes <[email protected]>
Suggested-by: Paul E. McKenney <[email protected]>
Suggested-by: Neeraj upadhyay <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# 8e5e6215 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Make parallel exp gp kworker per rcu node

When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
initialization is performed in parallel via workqueues (one work per
node).

How

rcu/exp: Make parallel exp gp kworker per rcu node

When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
initialization is performed in parallel via workqueues (one work per
node).

However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is
performed by a single kworker serializing each node initialization (one
work for all nodes).

The second part is certainly less scalable and efficient beyond a single
leaf node.

To improve this, expand this single kworker into per-node kworkers. This
new layout is eventually intended to remove the workqueues based
implementation since it will essentially now become duplicate code.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# e7539ffc 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Handle RCU expedited grace period kworker allocation failure

Just like is done for the kworker performing nodes initialization,
gracefully handle the possible allocation failure of the RCU

rcu/exp: Handle RCU expedited grace period kworker allocation failure

Just like is done for the kworker performing nodes initialization,
gracefully handle the possible allocation failure of the RCU expedited
grace period main kworker.

While at it perform a rename of the related checking functions to better
reflect the expedited specifics.

Reviewed-by: Kalesh Singh <[email protected]>
Fixes: 9621fbee44df ("rcu: Move expedited grace period (GP) work to RT kthread_worker")
Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7
# a7e4074d 18-Dec-2023 Frederic Weisbecker <[email protected]>

rcu/exp: Remove full barrier upon main thread wakeup

When an expedited grace period is ending, care must be taken so that all
the quiescent states propagated up to the root are correctly ordered
aga

rcu/exp: Remove full barrier upon main thread wakeup

When an expedited grace period is ending, care must be taken so that all
the quiescent states propagated up to the root are correctly ordered
against the wake up of the main expedited grace period workqueue.

This ordering is already carried through the root rnp locking augmented
by an smp_mb__after_unlock_lock() barrier.

Therefore the explicit smp_mb() placed before the wake up is not needed
and can be removed.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# e787644c 18-Dec-2023 Frederic Weisbecker <[email protected]>

rcu: Defer RCU kthreads wakeup when CPU is dying

When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiesc

rcu: Defer RCU kthreads wakeup when CPU is dying

When the CPU goes idle for the last time during the CPU down hotplug
process, RCU reports a final quiescent state for the current CPU. If
this quiescent state propagates up to the top, some tasks may then be
woken up to complete the grace period: the main grace period kthread
and/or the expedited main workqueue (or kworker).

If those kthreads have a SCHED_FIFO policy, the wake up can indirectly
arm the RT bandwith timer to the local offline CPU. Since this happens
after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the
timer gets ignored. Therefore if the RCU kthreads are waiting for RT
bandwidth to be available, they may never be actually scheduled.

This triggers TREE03 rcutorture hangs:

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved)
rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6)
rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x2eb/0xa80
schedule+0x1f/0x90
schedule_timeout+0x163/0x270
? __pfx_process_timeout+0x10/0x10
rcu_gp_fqs_loop+0x37c/0x5b0
? __pfx_rcu_gp_kthread+0x10/0x10
rcu_gp_kthread+0x17c/0x200
kthread+0xde/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2b/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>

The situation can't be solved with just unpinning the timer. The hrtimer
infrastructure and the nohz heuristics involved in finding the best
remote target for an unpinned timer would then also need to handle
enqueues from an offline CPU in the most horrendous way.

So fix this on the RCU side instead and defer the wake up to an online
CPU if it's too late for the local one.

Reported-by: Paul E. McKenney <[email protected]>
Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay (AMD) <[email protected]>

show more ...


Revision tags: v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7
# 5b404fda 15-Aug-2023 Paul E. McKenney <[email protected]>

rcu: Add RCU CPU stall notifier

It is sometimes helpful to have a way for the subsystem causing
the stall to dump its state when an RCU CPU stall occurs. This
commit therefore bases rcu_stall_chain

rcu: Add RCU CPU stall notifier

It is sometimes helpful to have a way for the subsystem causing
the stall to dump its state when an RCU CPU stall occurs. This
commit therefore bases rcu_stall_chain_notifier_register() and
rcu_stall_chain_notifier_unregister() on atomic notifiers in order to
provide this functionality.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>

show more ...


Revision tags: v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6
# 9146eb25 07-Apr-2023 Paul E. McKenney <[email protected]>

rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp

The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read

rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp

The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read
more widely. Unmarked accesses are OK from the corresponding CPU, but
only if interrupts are disabled, given that interrupt handlers can and
do modify this field.

Unfortunately, although the load from rcu_preempt_deferred_qs() is always
carried out from the corresponding CPU, interrupts are not necessarily
disabled. This commit therefore upgrades this load to READ_ONCE.

Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
might run with interrupts disabled and from some other CPU. This commit
therefore marks this load with data_race().

Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
is because interrupts are disabled and this load is always from the
corresponding CPU. This commit adds a comment giving the rationale for
this access being safe.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


123456