History log of /linux-6.15/kernel/rcu/tree_plugin.h (Results 1 – 25 of 497)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3
# 83b28cfe 13-Dec-2024 Ankur Arora <[email protected]>

rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One rea

rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y

With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One reason why this was needed: lacking preempt-count, the tick
handler has no way of knowing whether it is executing in a
read-side critical section or not.

With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), we get (PREEMPT_COUNT=y,
PREEMPT_RCU=n). In this configuration cond_resched() is a stub and
does not provide quiescent states via rcu_all_qs().
(PREEMPT_RCU=y provides this information via rcu_read_unlock() and
its nesting counter.)

So, use the availability of preempt_count() to report quiescent states
in rcu_flavor_sched_clock_irq().

Suggested-by: Paul E. McKenney <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# fcf0e25a 13-Dec-2024 Ankur Arora <[email protected]>

rcu: handle unstable rdp in rcu_read_unlock_strict()

rcu_read_unlock_strict() can be called with preemption enabled
which can make for an unstable rdp and a racy norm value.

Fix this by dropping th

rcu: handle unstable rdp in rcu_read_unlock_strict()

rcu_read_unlock_strict() can be called with preemption enabled
which can make for an unstable rdp and a racy norm value.

Fix this by dropping the preempt-count in __rcu_read_unlock()
after the call to rcu_read_unlock_strict(), adjusting the
preempt-count check appropriately.

Suggested-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Ankur Arora <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1
# db7ee3cb 26-Sep-2024 Frederic Weisbecker <[email protected]>

rcu: Use kthread preferred affinity for RCU boost

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert RCU boost to use
it inst

rcu: Use kthread preferred affinity for RCU boost

Now that kthreads have an infrastructure to handle preferred affinity
against CPU hotplug and housekeeping cpumask, convert RCU boost to use
it instead of handling all the constraints by itself.

Acked-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>

show more ...


# ecc5e6b0 26-Oct-2024 Paul E. McKenney <[email protected]>

rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp

The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding
CPU, and that CPU is not even allowed to race with itse

rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp

The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding
CPU, and that CPU is not even allowed to race with itself, for example,
via interrupt handlers. This commit therefore adds KCSAN exclusive-writer
assertions to check this constraint.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5
# c3291206 23-Aug-2024 Hongbo Li <[email protected]>

rcu: Use bitwise instead of arithmetic operator for flags

This silences the following coccinelle warning:
WARNING: sum of probable bitmasks, consider |

Signed-off-by: Hongbo Li <lihongbo22@huawei

rcu: Use bitwise instead of arithmetic operator for flags

This silences the following coccinelle warning:
WARNING: sum of probable bitmasks, consider |

Signed-off-by: Hongbo Li <[email protected]>
Reviewed-by: "Paul E. McKenney" <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>

show more ...


Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7
# 32a9f26e 29-Apr-2024 Valentin Schneider <[email protected]>

rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the

rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()

The context_tracking.state RCU_DYNTICKS subvariable has been renamed to
RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the dyntick
reference.

Signed-off-by: Valentin Schneider <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 7121dd91 30-May-2024 Frederic Weisbecker <[email protected]>

rcu/nocb: Introduce nocb mutex

The barrier_mutex is used currently to protect (de-)offloading
operations and prevent from nocb_lock locking imbalance in rcu_barrier()
and shrinker, and also from mis

rcu/nocb: Introduce nocb mutex

The barrier_mutex is used currently to protect (de-)offloading
operations and prevent from nocb_lock locking imbalance in rcu_barrier()
and shrinker, and also from misordered RCU barrier invocation.

Now since RCU (de-)offloading is going to happen on offline CPUs, an RCU
barrier will have to be executed while transitionning from offloaded to
de-offloaded state. And this can't happen while holding the
barrier_mutex.

Introduce a NOCB mutex to protect (de-)offloading transitions. The
barrier_mutex is still held for now when necessary to avoid barrier
callbacks reordering and nocb_lock imbalance.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 7aeba709 30-May-2024 Frederic Weisbecker <[email protected]>

rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()

Checking for races against concurrent (de-)offloading implies the
creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock
is held. For now t

rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()

Checking for races against concurrent (de-)offloading implies the
creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock
is held. For now this only implies the nocb_lock but more are to be
expected.

Create instead a NOCB specific version of RCU_LOCKDEP_WARN() to avoid
the proliferation of stubs.

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Neeraj Upadhyay <[email protected]>

show more ...


# 68d124b0 09-May-2024 Paul E. McKenney <[email protected]>

rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter

If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur

rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter

If a CPU is running either a userspace application or a guest OS in
nohz_full mode, it is possible for a system call to occur just as an
RCU grace period is starting. If that CPU also has the scheduling-clock
tick enabled for any reason (such as a second runnable task), and if the
system was booted with rcutree.use_softirq=0, then RCU can add insult to
injury by awakening that CPU's rcuc kthread, resulting in yet another
task and yet more OS jitter due to switching to that task, running it,
and switching back.

In addition, in the common case where that system call is not of
excessively long duration, awakening the rcuc task is pointless.
This pointlessness is due to the fact that the CPU will enter an extended
quiescent state upon returning to the userspace application or guest OS.
In this case, the rcuc kthread cannot do anything that the main RCU
grace-period kthread cannot do on its behalf, at least if it is given
a few additional milliseconds (for example, given the time duration
specified by rcutree.jiffies_till_first_fqs, give or take scheduling
delays).

This commit therefore adds a rcutree.nohz_full_patience_delay kernel
boot parameter that specifies the grace period age (in milliseconds,
rounded to jiffies) before which RCU will refrain from awakening the
rcuc kthread. Preliminary experimentation suggests a value of 1000,
that is, one second. Increasing rcutree.nohz_full_patience_delay will
increase grace-period latency and in turn increase memory footprint,
so systems with constrained memory might choose a smaller value.
Systems with less-aggressive OS-jitter requirements might choose the
default value of zero, which keeps the traditional immediate-wakeup
behavior, thus avoiding increases in grace-period latency.

[ paulmck: Apply Leonardo Bras feedback. ]

Link: https://lore.kernel.org/all/[email protected]/

Reported-by: Leonardo Bras <[email protected]>
Suggested-by: Leonardo Bras <[email protected]>
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Leonardo Bras <[email protected]>

show more ...


Revision tags: v6.9-rc6
# 483d5bf2 25-Apr-2024 Frederic Weisbecker <[email protected]>

rcu/nocb: Use kthread parking instead of ad-hoc implementation

Upon NOCB deoffloading, the rcuo kthread must be forced to sleep
until the corresponding rdp is ever offloaded again. The deoffloader
c

rcu/nocb: Use kthread parking instead of ad-hoc implementation

Upon NOCB deoffloading, the rcuo kthread must be forced to sleep
until the corresponding rdp is ever offloaded again. The deoffloader
clears the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which
then notices that change and clears in turn its SEGCBLIST_KTHREAD_CB
flag before going to sleep, until it ever sees the SEGCBLIST_OFFLOADED
flag again, should a re-offloading happen.

Upon NOCB offloading, the rcuo kthread must be forced to wake up and
handle callbacks until the corresponding rdp is ever deoffloaded again.
The offloader sets the SEGCBLIST_OFFLOADED flag, wakes up the rcuo
kthread which then notices that change and sets in turn its
SEGCBLIST_KTHREAD_CB flag before going to check callbacks, until it
ever sees the SEGCBLIST_OFFLOADED flag cleared again, should a
de-offloading happen again.

This is all a crude ad-hoc and error-prone kthread (un-)parking
re-implementation.

Consolidate the behaviour with the appropriate API instead.

[ paulmck: Apply Qiang Zhang feedback provided in Link: below. ]
Link: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8
# ae2b217a 08-Mar-2024 Paul E. McKenney <[email protected]>

rcu: Make hotplug operations track GP state, not flags

Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
time

rcu: Make hotplug operations track GP state, not flags

Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq
and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the
time of the corresponding CPU's last online or offline operation,
respectively. However, this information is not particularly useful.
It would be better to instead track the grace period state kept
in rcu_state.gp_state. This would also be consistent with the
initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED
(an rcu_state.gp_state value), and also with the diagnostics in
rcu_implicit_dynticks_qs(), whose format is consistent with an integer,
not a bitmask.

This commit therefore makes this change and changes the names to
->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.

Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1
# b67cffcb 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Handle parallel exp gp kworkers affinity

Affine the parallel expedited gp kworkers to their respective RCU node
in order to make them close to the cache their are playing with.

This reuses

rcu/exp: Handle parallel exp gp kworkers affinity

Affine the parallel expedited gp kworkers to their respective RCU node
in order to make them close to the cache their are playing with.

This reuses the boost kthreads machinery that probe into CPU hotplug
operations such that the kthreads become/stay affine to their respective
node as soon/long as they contain online CPUs. Otherwise and if the
current CPU going down was the last online on the leaf node, the related
kthread is affine to the housekeeping CPUs.

In the long run, this affinity VS CPU hotplug operation game should
probably be implemented at the generic kthread level.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
[boqun: s/* rcu_boost_task/*rcu_boost_task as reported by checkpatch]
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# 8e5e6215 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu/exp: Make parallel exp gp kworker per rcu node

When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
initialization is performed in parallel via workqueues (one work per
node).

How

rcu/exp: Make parallel exp gp kworker per rcu node

When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node
initialization is performed in parallel via workqueues (one work per
node).

However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is
performed by a single kworker serializing each node initialization (one
work for all nodes).

The second part is certainly less scalable and efficient beyond a single
leaf node.

To improve this, expand this single kworker into per-node kworkers. This
new layout is eventually intended to remove the workqueues based
implementation since it will essentially now become duplicate code.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


# 7836b270 12-Jan-2024 Frederic Weisbecker <[email protected]>

rcu: s/boost_kthread_mutex/kthread_mutex

This mutex is currently protecting per node boost kthreads creation and
affinity setting across CPU hotplug operations.

Since the expedited kworkers will so

rcu: s/boost_kthread_mutex/kthread_mutex

This mutex is currently protecting per node boost kthreads creation and
affinity setting across CPU hotplug operations.

Since the expedited kworkers will soon be split per node as well, they
will be subject to the same concurrency constraints against hotplug.

Therefore their creation and affinity tuning operations will be grouped
with those of boost kthreads and then rely on the same mutex.

To prepare for that, generalize its name.

Signed-off-by: Frederic Weisbecker <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Boqun Feng <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6
# 9146eb25 07-Apr-2023 Paul E. McKenney <[email protected]>

rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp

The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read

rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp

The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read
more widely. Unmarked accesses are OK from the corresponding CPU, but
only if interrupts are disabled, given that interrupt handlers can and
do modify this field.

Unfortunately, although the load from rcu_preempt_deferred_qs() is always
carried out from the corresponding CPU, interrupts are not necessarily
disabled. This commit therefore upgrades this load to READ_ONCE.

Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
might run with interrupts disabled and from some other CPU. This commit
therefore marks this load with data_race().

Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
is because interrupts are disabled and this load is always from the
corresponding CPU. This commit adds a comment giving the rationale for
this access being safe.

This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5
# 6343402a 06-Sep-2022 Pingfan Liu <[email protected]>

rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()

Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked
concurrently, the following rcu_boost_kthread_setaffinity() rac

rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()

Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked
concurrently, the following rcu_boost_kthread_setaffinity() race can
occur:

CPU 1 CPU2
mask = rcu_rnp_online_cpus(rnp);
...

mask = rcu_rnp_online_cpus(rnp);
...
set_cpus_allowed_ptr(t, cm);

set_cpus_allowed_ptr(t, cm);

This results in CPU2's update being overwritten by that of CPU1, and
thus the possibility of ->boost_kthread_task continuing to run on a
to-be-offlined CPU.

This commit therefore eliminates this race by relying on the pre-existing
acquisition of ->boost_kthread_mutex to serialize the full process of
changing the affinity of ->boost_kthread_task.

Signed-off-by: Pingfan Liu <[email protected]>
Cc: David Woodhouse <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Neeraj Upadhyay <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Lai Jiangshan <[email protected]>
Cc: Joel Fernandes <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8
# 528262f5 19-Jul-2022 Zqiang <[email protected]>

rcu-tasks: Make RCU Tasks Trace check for userspace execution

Userspace execution is a valid quiescent state for RCU Tasks Trace,
but the scheduling-clock interrupt does not currently report such
qu

rcu-tasks: Make RCU Tasks Trace check for userspace execution

Userspace execution is a valid quiescent state for RCU Tasks Trace,
but the scheduling-clock interrupt does not currently report such
quiescent states.

Of course, the scheduling-clock interrupt is not strictly speaking
userspace execution. However, the only way that this code is not
in a quiescent state is if something invoked rcu_read_lock_trace(),
and that would be reflected in the ->trc_reader_nesting field in
the task_struct structure. Furthermore, this field is checked by
rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in
turn invoked by rcu_note_voluntary_context_switch() in kernels building
at least one of the RCU Tasks flavors. It is therefore safe to invoke
rcu_tasks_trace_qs() from the rcu_sched_clock_irq().

But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU
Tasks, which lacks the read-side markers provided by RCU Tasks Trace.
This raises the possibility that an RCU Tasks grace period could start
after the interrupt from userspace execution, but before the call to
rcu_sched_clock_irq(). However, it turns out that this is safe because
the RCU Tasks grace period waits for an RCU grace period, which will
wait for the entire scheduling-clock interrupt handler, including any
RCU Tasks read-side critical section that this handler might contain.

This commit therefore updates the rcu_sched_clock_irq() function's
check for usermode execution and its call to rcu_tasks_classic_qs()
to instead check for both usermode execution and interrupt from idle,
and to instead call rcu_note_voluntary_context_switch(). This
consolidates code and provides more faster RCU Tasks Trace
reporting of quiescent states in kernels that do scheduling-clock
interrupts for userspace execution.

[ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


# 7634b1ea 24-Aug-2022 Paul E. McKenney <[email protected]>

rcu: Exclude outgoing CPU when it is the last to leave

The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
from the set_cpus_allowed() mask for the corresponding leaf rcu_node
stru

rcu: Exclude outgoing CPU when it is the last to leave

The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
from the set_cpus_allowed() mask for the corresponding leaf rcu_node
structure's rcub priority-boosting kthread. Except that if the outgoing
CPU will leave that structure without any online CPUs, the mask is set
to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine
unless the outgoing CPU happens to be a housekeeping CPU.

This commit therefore removes the outgoing CPU from the housekeeping mask.
This would of course be problematic if the outgoing CPU was the last
online housekeeping CPU, but in that case you are in a world of hurt
anyway. If someone comes up with a valid use case for a system needing
all the housekeeping CPUs to be offline, further adjustments can be made.

Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


# 621189a1 08-Aug-2022 Zqiang <[email protected]>

rcu: Avoid triggering strict-GP irq-work when RCU is idle

Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
i

rcu: Avoid triggering strict-GP irq-work when RCU is idle

Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
invokes rcu_preempt_deferred_qs_handle(). The point of this triggering
is to force grace periods to end quickly in order to give tools like KASAN
a better chance of detecting RCU usage bugs such as leaking RCU-protected
pointers out of an RCU read-side critical section.

However, this irq-work triggering is unconditional. This works, but
there is no point in doing this irq-work unless the current grace period
is waiting on the running CPU or task, which is not the common case.
After all, in the common case there are many rcu_read_unlock() calls
per CPU per grace period.

This commit therefore triggers the irq-work only when the current grace
period is waiting on the running CPU or task.

This change was tested as follows on a four-CPU system:

echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
insmod rcutorture.ko
sleep 20
rmmod rcutorture.ko
echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
echo > /sys/kernel/debug/tracing/set_ftrace_filter

This procedure produces results in this per-CPU set of files:

/sys/kernel/debug/tracing/trace_stat/function*

Sample output from one of these files is as follows:

Function Hit Time Avg s^2
-------- --- ---- --- ---
rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us

The baseline sum of the "Hit" values (the number of calls to this
function) was 3,319,015. With this commit, that sum was 1,140,359,
for a 2.9x reduction. The worst-case variance across the CPUs was less
than 25%, so this large effect size is statistically significant.

The raw data is available in the Link: URL.

Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


# 089254fd 03-Aug-2022 Paul E. McKenney <[email protected]>

rcu: Document reason for rcu_all_qs() call to preempt_disable()

Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
it invoke preempt_disable()? This commit adds the reason,

rcu: Document reason for rcu_all_qs() call to preempt_disable()

Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
it invoke preempt_disable()? This commit adds the reason, which is to
work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.

Reported-by: Neeraj Upadhyay <[email protected]>
Reported-by: Boqun Feng <[email protected]>
Reported-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4
# bca4fa8c 20-Jun-2022 Zqiang <[email protected]>

rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels

In non-premptible kernels, tasks never do context switches within
RCU read-side critical sections. Therefore, in such kernels, ea

rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels

In non-premptible kernels, tasks never do context switches within
RCU read-side critical sections. Therefore, in such kernels, each
leaf rcu_node structure's ->blkd_tasks list will always be empty.
The comment on the non-preemptible version of rcu_preempt_deferred_qs()
confuses this point, so this commit therefore fixes it.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v5.19-rc3
# 6d60ea03 16-Jun-2022 Zqiang <[email protected]>

rcu: Fix rcu_read_unlock_strict() strict QS reporting

Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
report the quiescent state directly from the outermost rcu_read_unlock(

rcu: Fix rcu_read_unlock_strict() strict QS reporting

Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
report the quiescent state directly from the outermost rcu_read_unlock().
However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm
might still be set, in which case rcu_report_qs_rdp() will exit early,
thus failing to report quiescent state.

This commit therefore causes rcu_read_unlock_strict() to clear
CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking
rcu_report_qs_rdp().

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5
# 51038506 29-Apr-2022 Zqiang <[email protected]>

rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

Callbacks are invoked in RCU kthreads when calbacks are offloaded
(rcu_nocbs boot parameter) or when RCU's softirq handler has been
offlo

rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()

Callbacks are invoked in RCU kthreads when calbacks are offloaded
(rcu_nocbs boot parameter) or when RCU's softirq handler has been
offloaded to rcuc kthreads (use_softirq==0). The current code allows
for the rcu_nocbs case but not the use_softirq case. This commit adds
support for the use_softirq case.

Reported-by: kernel test robot <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>

show more ...


# 70a82c3c 13-May-2022 Zqiang <[email protected]>

rcu: Immediately boost preempted readers for strict grace periods

The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
cause normal grace periods to complete quickly in order to bett

rcu: Immediately boost preempted readers for strict grace periods

The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to
cause normal grace periods to complete quickly in order to better catch
errors resulting from improperly leaking pointers from RCU read-side
critical sections. However, kernels built with this option enabled still
wait for some hundreds of milliseconds before boosting RCU readers that
have been preempted within their current critical section. The value
of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option,
which defaults to 500 milliseconds.

This commit therefore causes kernels build with strict grace periods
to ignore CONFIG_RCU_BOOST_DELAY. This causes rcu_initiate_boost()
to start boosting immediately after all CPUs on a given leaf rcu_node
structure have passed through their quiescent states.

Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>

show more ...


# 48f8070f 26-Apr-2022 Patrick Wang <[email protected]>

rcu: Avoid tracing a few functions executed in stop machine

Stop-machine recently started calling additional functions while waiting:

---------------------------------------------------------------

rcu: Avoid tracing a few functions executed in stop machine

Stop-machine recently started calling additional functions while waiting:

----------------------------------------------------------------
Former stop machine wait loop:
do {
cpu_relax(); => macro
...
} while (curstate != STOPMACHINE_EXIT);
-----------------------------------------------------------------
Current stop machine wait loop:
do {
stop_machine_yield(cpumask); => function (notraced)
...
touch_nmi_watchdog(); => function (notraced, inside calls also notraced)
...
rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced)
} while (curstate != MULTI_STOP_EXIT);
------------------------------------------------------------------

These functions (and the functions that they call) must be marked
notrace to prevent them from being updated while they are executing.
The consequences of failing to mark these functions can be severe:

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: 1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0
rcu: 3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0
(detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4)
Task dump for CPU 1:
task:migration/1 state:R running task stack: 0 pid: 19 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:
Task dump for CPU 3:
task:migration/3 state:R running task stack: 0 pid: 29 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:
rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu: Possible timer handling issue on cpu=2 timer-softirq=594
rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:I stack: 0 pid: 14 ppid: 2 flags:0x00000000
Call Trace:
schedule+0x56/0xc2
schedule_timeout+0x82/0x184
rcu_gp_fqs_loop+0x19a/0x318
rcu_gp_kthread+0x11a/0x140
kthread+0xee/0x118
ret_from_exception+0x0/0x14
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 2:
task:migration/2 state:R running task stack: 0 pid: 24 ppid: 2 flags:0x00000000
Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174
Call Trace:

This commit therefore marks these functions notrace:
rcu_preempt_deferred_qs()
rcu_preempt_need_deferred_qs()
rcu_preempt_deferred_qs_irqrestore()

[ paulmck: Apply feedback from Neeraj Upadhyay. ]

Signed-off-by: Patrick Wang <[email protected]>
Acked-by: Steven Rostedt (Google) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Reviewed-by: Neeraj Upadhyay <[email protected]>

show more ...


12345678910>>...20