|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
83b28cfe |
| 13-Dec-2024 |
Ankur Arora <[email protected]> |
rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent states for read-side critical sections via rcu_all_qs(). One rea
rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent states for read-side critical sections via rcu_all_qs(). One reason why this was needed: lacking preempt-count, the tick handler has no way of knowing whether it is executing in a read-side critical section or not.
With (PREEMPT_LAZY=y, PREEMPT_DYNAMIC=n), we get (PREEMPT_COUNT=y, PREEMPT_RCU=n). In this configuration cond_resched() is a stub and does not provide quiescent states via rcu_all_qs(). (PREEMPT_RCU=y provides this information via rcu_read_unlock() and its nesting counter.)
So, use the availability of preempt_count() to report quiescent states in rcu_flavor_sched_clock_irq().
Suggested-by: Paul E. McKenney <[email protected]> Reviewed-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Ankur Arora <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
fcf0e25a |
| 13-Dec-2024 |
Ankur Arora <[email protected]> |
rcu: handle unstable rdp in rcu_read_unlock_strict()
rcu_read_unlock_strict() can be called with preemption enabled which can make for an unstable rdp and a racy norm value.
Fix this by dropping th
rcu: handle unstable rdp in rcu_read_unlock_strict()
rcu_read_unlock_strict() can be called with preemption enabled which can make for an unstable rdp and a racy norm value.
Fix this by dropping the preempt-count in __rcu_read_unlock() after the call to rcu_read_unlock_strict(), adjusting the preempt-count check appropriately.
Suggested-by: Frederic Weisbecker <[email protected]> Signed-off-by: Ankur Arora <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
db7ee3cb |
| 26-Sep-2024 |
Frederic Weisbecker <[email protected]> |
rcu: Use kthread preferred affinity for RCU boost
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU boost to use it inst
rcu: Use kthread preferred affinity for RCU boost
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU boost to use it instead of handling all the constraints by itself.
Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
ecc5e6b0 |
| 26-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding CPU, and that CPU is not even allowed to race with itse
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
The value of rdp->cpu_no_qs.b.exp may be changed only by the corresponding CPU, and that CPU is not even allowed to race with itself, for example, via interrupt handlers. This commit therefore adds KCSAN exclusive-writer assertions to check this constraint.
Signed-off-by: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5 |
|
| #
c3291206 |
| 23-Aug-2024 |
Hongbo Li <[email protected]> |
rcu: Use bitwise instead of arithmetic operator for flags
This silences the following coccinelle warning: WARNING: sum of probable bitmasks, consider |
Signed-off-by: Hongbo Li <lihongbo22@huawei
rcu: Use bitwise instead of arithmetic operator for flags
This silences the following coccinelle warning: WARNING: sum of probable bitmasks, consider |
Signed-off-by: Hongbo Li <[email protected]> Reviewed-by: "Paul E. McKenney" <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
32a9f26e |
| 29-Apr-2024 |
Valentin Schneider <[email protected]> |
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
The context_tracking.state RCU_DYNTICKS subvariable has been renamed to RCU_WATCHING, replace "dyntick_idle" into "eqs" to drop the dyntick reference.
Signed-off-by: Valentin Schneider <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
7121dd91 |
| 30-May-2024 |
Frederic Weisbecker <[email protected]> |
rcu/nocb: Introduce nocb mutex
The barrier_mutex is used currently to protect (de-)offloading operations and prevent from nocb_lock locking imbalance in rcu_barrier() and shrinker, and also from mis
rcu/nocb: Introduce nocb mutex
The barrier_mutex is used currently to protect (de-)offloading operations and prevent from nocb_lock locking imbalance in rcu_barrier() and shrinker, and also from misordered RCU barrier invocation.
Now since RCU (de-)offloading is going to happen on offline CPUs, an RCU barrier will have to be executed while transitionning from offloaded to de-offloaded state. And this can't happen while holding the barrier_mutex.
Introduce a NOCB mutex to protect (de-)offloading transitions. The barrier_mutex is still held for now when necessary to avoid barrier callbacks reordering and nocb_lock imbalance.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
7aeba709 |
| 30-May-2024 |
Frederic Weisbecker <[email protected]> |
rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()
Checking for races against concurrent (de-)offloading implies the creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock is held. For now t
rcu/nocb: Introduce RCU_NOCB_LOCKDEP_WARN()
Checking for races against concurrent (de-)offloading implies the creation of !CONFIG_RCU_NOCB_CPU stubs to check if each relevant lock is held. For now this only implies the nocb_lock but more are to be expected.
Create instead a NOCB specific version of RCU_LOCKDEP_WARN() to avoid the proliferation of stubs.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
68d124b0 |
| 09-May-2024 |
Paul E. McKenney <[email protected]> |
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur
rcu: Add rcutree.nohz_full_patience_delay to reduce nohz_full OS jitter
If a CPU is running either a userspace application or a guest OS in nohz_full mode, it is possible for a system call to occur just as an RCU grace period is starting. If that CPU also has the scheduling-clock tick enabled for any reason (such as a second runnable task), and if the system was booted with rcutree.use_softirq=0, then RCU can add insult to injury by awakening that CPU's rcuc kthread, resulting in yet another task and yet more OS jitter due to switching to that task, running it, and switching back.
In addition, in the common case where that system call is not of excessively long duration, awakening the rcuc task is pointless. This pointlessness is due to the fact that the CPU will enter an extended quiescent state upon returning to the userspace application or guest OS. In this case, the rcuc kthread cannot do anything that the main RCU grace-period kthread cannot do on its behalf, at least if it is given a few additional milliseconds (for example, given the time duration specified by rcutree.jiffies_till_first_fqs, give or take scheduling delays).
This commit therefore adds a rcutree.nohz_full_patience_delay kernel boot parameter that specifies the grace period age (in milliseconds, rounded to jiffies) before which RCU will refrain from awakening the rcuc kthread. Preliminary experimentation suggests a value of 1000, that is, one second. Increasing rcutree.nohz_full_patience_delay will increase grace-period latency and in turn increase memory footprint, so systems with constrained memory might choose a smaller value. Systems with less-aggressive OS-jitter requirements might choose the default value of zero, which keeps the traditional immediate-wakeup behavior, thus avoiding increases in grace-period latency.
[ paulmck: Apply Leonardo Bras feedback. ]
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Leonardo Bras <[email protected]> Suggested-by: Leonardo Bras <[email protected]> Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Leonardo Bras <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6 |
|
| #
483d5bf2 |
| 25-Apr-2024 |
Frederic Weisbecker <[email protected]> |
rcu/nocb: Use kthread parking instead of ad-hoc implementation
Upon NOCB deoffloading, the rcuo kthread must be forced to sleep until the corresponding rdp is ever offloaded again. The deoffloader c
rcu/nocb: Use kthread parking instead of ad-hoc implementation
Upon NOCB deoffloading, the rcuo kthread must be forced to sleep until the corresponding rdp is ever offloaded again. The deoffloader clears the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which then notices that change and clears in turn its SEGCBLIST_KTHREAD_CB flag before going to sleep, until it ever sees the SEGCBLIST_OFFLOADED flag again, should a re-offloading happen.
Upon NOCB offloading, the rcuo kthread must be forced to wake up and handle callbacks until the corresponding rdp is ever deoffloaded again. The offloader sets the SEGCBLIST_OFFLOADED flag, wakes up the rcuo kthread which then notices that change and sets in turn its SEGCBLIST_KTHREAD_CB flag before going to check callbacks, until it ever sees the SEGCBLIST_OFFLOADED flag cleared again, should a de-offloading happen again.
This is all a crude ad-hoc and error-prone kthread (un-)parking re-implementation.
Consolidate the behaviour with the appropriate API instead.
[ paulmck: Apply Qiang Zhang feedback provided in Link: below. ] Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8 |
|
| #
ae2b217a |
| 08-Mar-2024 |
Paul E. McKenney <[email protected]> |
rcu: Make hotplug operations track GP state, not flags
Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the time
rcu: Make hotplug operations track GP state, not flags
Currently, there are rcu_data structure fields named ->rcu_onl_gp_seq and ->rcu_ofl_gp_seq that track the rcu_state.gp_flags field at the time of the corresponding CPU's last online or offline operation, respectively. However, this information is not particularly useful. It would be better to instead track the grace period state kept in rcu_state.gp_state. This would also be consistent with the initialization in rcu_boot_init_percpu_data(), which is to RCU_GP_CLEANED (an rcu_state.gp_state value), and also with the diagnostics in rcu_implicit_dynticks_qs(), whose format is consistent with an integer, not a bitmask.
This commit therefore makes this change and changes the names to ->rcu_onl_gp_flags and ->rcu_ofl_gp_flags, respectively.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1 |
|
| #
b67cffcb |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Handle parallel exp gp kworkers affinity
Affine the parallel expedited gp kworkers to their respective RCU node in order to make them close to the cache their are playing with.
This reuses
rcu/exp: Handle parallel exp gp kworkers affinity
Affine the parallel expedited gp kworkers to their respective RCU node in order to make them close to the cache their are playing with.
This reuses the boost kthreads machinery that probe into CPU hotplug operations such that the kthreads become/stay affine to their respective node as soon/long as they contain online CPUs. Otherwise and if the current CPU going down was the last online on the leaf node, the related kthread is affine to the housekeeping CPUs.
In the long run, this affinity VS CPU hotplug operation game should probably be implemented at the generic kthread level.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> [boqun: s/* rcu_boost_task/*rcu_boost_task as reported by checkpatch] Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
8e5e6215 |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu/exp: Make parallel exp gp kworker per rcu node
When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node).
How
rcu/exp: Make parallel exp gp kworker per rcu node
When CONFIG_RCU_EXP_KTHREAD=n, the expedited grace period per node initialization is performed in parallel via workqueues (one work per node).
However in CONFIG_RCU_EXP_KTHREAD=y, this per node initialization is performed by a single kworker serializing each node initialization (one work for all nodes).
The second part is certainly less scalable and efficient beyond a single leaf node.
To improve this, expand this single kworker into per-node kworkers. This new layout is eventually intended to remove the workqueues based implementation since it will essentially now become duplicate code.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
7836b270 |
| 12-Jan-2024 |
Frederic Weisbecker <[email protected]> |
rcu: s/boost_kthread_mutex/kthread_mutex
This mutex is currently protecting per node boost kthreads creation and affinity setting across CPU hotplug operations.
Since the expedited kworkers will so
rcu: s/boost_kthread_mutex/kthread_mutex
This mutex is currently protecting per node boost kthreads creation and affinity setting across CPU hotplug operations.
Since the expedited kworkers will soon be split per node as well, they will be subject to the same concurrency constraints against hotplug.
Therefore their creation and affinity tuning operations will be grouped with those of boost kthreads and then rely on the same mutex.
To prepare for that, generalize its name.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6 |
|
| #
9146eb25 |
| 07-Apr-2023 |
Paul E. McKenney <[email protected]> |
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read more widely. Unmarked accesses are OK from the corresponding CPU, but only if interrupts are disabled, given that interrupt handlers can and do modify this field.
Unfortunately, although the load from rcu_preempt_deferred_qs() is always carried out from the corresponding CPU, interrupts are not necessarily disabled. This commit therefore upgrades this load to READ_ONCE.
Similarly, the diagnostic access from synchronize_rcu_expedited_wait() might run with interrupts disabled and from some other CPU. This commit therefore marks this load with data_race().
Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as is because interrupts are disabled and this load is always from the corresponding CPU. This commit adds a comment giving the rationale for this access being safe.
This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely.
Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5 |
|
| #
6343402a |
| 06-Sep-2022 |
Pingfan Liu <[email protected]> |
rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()
Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked concurrently, the following rcu_boost_kthread_setaffinity() rac
rcu: Synchronize ->qsmaskinitnext in rcu_boost_kthread_setaffinity()
Once either rcutree_online_cpu() or rcutree_dead_cpu() is invoked concurrently, the following rcu_boost_kthread_setaffinity() race can occur:
CPU 1 CPU2 mask = rcu_rnp_online_cpus(rnp); ...
mask = rcu_rnp_online_cpus(rnp); ... set_cpus_allowed_ptr(t, cm);
set_cpus_allowed_ptr(t, cm);
This results in CPU2's update being overwritten by that of CPU1, and thus the possibility of ->boost_kthread_task continuing to run on a to-be-offlined CPU.
This commit therefore eliminates this race by relying on the pre-existing acquisition of ->boost_kthread_mutex to serialize the full process of changing the affinity of ->boost_kthread_task.
Signed-off-by: Pingfan Liu <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Joel Fernandes <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8 |
|
| #
528262f5 |
| 19-Jul-2022 |
Zqiang <[email protected]> |
rcu-tasks: Make RCU Tasks Trace check for userspace execution
Userspace execution is a valid quiescent state for RCU Tasks Trace, but the scheduling-clock interrupt does not currently report such qu
rcu-tasks: Make RCU Tasks Trace check for userspace execution
Userspace execution is a valid quiescent state for RCU Tasks Trace, but the scheduling-clock interrupt does not currently report such quiescent states.
Of course, the scheduling-clock interrupt is not strictly speaking userspace execution. However, the only way that this code is not in a quiescent state is if something invoked rcu_read_lock_trace(), and that would be reflected in the ->trc_reader_nesting field in the task_struct structure. Furthermore, this field is checked by rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in turn invoked by rcu_note_voluntary_context_switch() in kernels building at least one of the RCU Tasks flavors. It is therefore safe to invoke rcu_tasks_trace_qs() from the rcu_sched_clock_irq().
But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU Tasks, which lacks the read-side markers provided by RCU Tasks Trace. This raises the possibility that an RCU Tasks grace period could start after the interrupt from userspace execution, but before the call to rcu_sched_clock_irq(). However, it turns out that this is safe because the RCU Tasks grace period waits for an RCU grace period, which will wait for the entire scheduling-clock interrupt handler, including any RCU Tasks read-side critical section that this handler might contain.
This commit therefore updates the rcu_sched_clock_irq() function's check for usermode execution and its call to rcu_tasks_classic_qs() to instead check for both usermode execution and interrupt from idle, and to instead call rcu_note_voluntary_context_switch(). This consolidates code and provides more faster RCU Tasks Trace reporting of quiescent states in kernels that do scheduling-clock interrupts for userspace execution.
[ paulmck: Consolidate checks into rcu_sched_clock_irq(). ]
Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
| #
7634b1ea |
| 24-Aug-2022 |
Paul E. McKenney <[email protected]> |
rcu: Exclude outgoing CPU when it is the last to leave
The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node stru
rcu: Exclude outgoing CPU when it is the last to leave
The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node structure's rcub priority-boosting kthread. Except that if the outgoing CPU will leave that structure without any online CPUs, the mask is set to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine unless the outgoing CPU happens to be a housekeeping CPU.
This commit therefore removes the outgoing CPU from the housekeeping mask. This would of course be problematic if the outgoing CPU was the last online housekeeping CPU, but in that case you are in a world of hurt anyway. If someone comes up with a valid use case for a system needing all the housekeeping CPUs to be offline, further adjustments can be made.
Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
| #
621189a1 |
| 08-Aug-2022 |
Zqiang <[email protected]> |
rcu: Avoid triggering strict-GP irq-work when RCU is idle
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler i
rcu: Avoid triggering strict-GP irq-work when RCU is idle
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section.
However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period.
This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task.
This change was tested as follows on a four-CPU system:
echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter
This procedure produces results in this per-CPU set of files:
/sys/kernel/debug/tracing/trace_stat/function*
Sample output from one of these files is as follows:
Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us
The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant.
The raw data is available in the Link: URL.
Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
| #
089254fd |
| 03-Aug-2022 |
Paul E. McKenney <[email protected]> |
rcu: Document reason for rcu_all_qs() call to preempt_disable()
Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason,
rcu: Document reason for rcu_all_qs() call to preempt_disable()
Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason, which is to work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.
Reported-by: Neeraj Upadhyay <[email protected]> Reported-by: Boqun Feng <[email protected]> Reported-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4 |
|
| #
bca4fa8c |
| 20-Jun-2022 |
Zqiang <[email protected]> |
rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, ea
rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, each leaf rcu_node structure's ->blkd_tasks list will always be empty. The comment on the non-preemptible version of rcu_preempt_deferred_qs() confuses this point, so this commit therefore fixes it.
Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc3 |
|
| #
6d60ea03 |
| 16-Jun-2022 |
Zqiang <[email protected]> |
rcu: Fix rcu_read_unlock_strict() strict QS reporting
Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(
rcu: Fix rcu_read_unlock_strict() strict QS reporting
Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(). However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm might still be set, in which case rcu_report_qs_rdp() will exit early, thus failing to report quiescent state.
This commit therefore causes rcu_read_unlock_strict() to clear CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp().
Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5 |
|
| #
51038506 |
| 29-Apr-2022 |
Zqiang <[email protected]> |
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Callbacks are invoked in RCU kthreads when calbacks are offloaded (rcu_nocbs boot parameter) or when RCU's softirq handler has been offlo
rcu: Add nocb_cb_kthread check to rcu_is_callbacks_kthread()
Callbacks are invoked in RCU kthreads when calbacks are offloaded (rcu_nocbs boot parameter) or when RCU's softirq handler has been offloaded to rcuc kthreads (use_softirq==0). The current code allows for the rcu_nocbs case but not the use_softirq case. This commit adds support for the use_softirq case.
Reported-by: kernel test robot <[email protected]> Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
70a82c3c |
| 13-May-2022 |
Zqiang <[email protected]> |
rcu: Immediately boost preempted readers for strict grace periods
The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to cause normal grace periods to complete quickly in order to bett
rcu: Immediately boost preempted readers for strict grace periods
The intent of the CONFIG_RCU_STRICT_GRACE_PERIOD Konfig option is to cause normal grace periods to complete quickly in order to better catch errors resulting from improperly leaking pointers from RCU read-side critical sections. However, kernels built with this option enabled still wait for some hundreds of milliseconds before boosting RCU readers that have been preempted within their current critical section. The value of this delay is set by the CONFIG_RCU_BOOST_DELAY Kconfig option, which defaults to 500 milliseconds.
This commit therefore causes kernels build with strict grace periods to ignore CONFIG_RCU_BOOST_DELAY. This causes rcu_initiate_boost() to start boosting immediately after all CPUs on a given leaf rcu_node structure have passed through their quiescent states.
Signed-off-by: Zqiang <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
show more ...
|
| #
48f8070f |
| 26-Apr-2022 |
Patrick Wang <[email protected]> |
rcu: Avoid tracing a few functions executed in stop machine
Stop-machine recently started calling additional functions while waiting:
---------------------------------------------------------------
rcu: Avoid tracing a few functions executed in stop machine
Stop-machine recently started calling additional functions while waiting:
---------------------------------------------------------------- Former stop machine wait loop: do { cpu_relax(); => macro ... } while (curstate != STOPMACHINE_EXIT); ----------------------------------------------------------------- Current stop machine wait loop: do { stop_machine_yield(cpumask); => function (notraced) ... touch_nmi_watchdog(); => function (notraced, inside calls also notraced) ... rcu_momentary_dyntick_idle(); => function (notraced, inside calls traced) } while (curstate != MULTI_STOP_EXIT); ------------------------------------------------------------------
These functions (and the functions that they call) must be marked notrace to prevent them from being updated while they are executing. The consequences of failing to mark these functions can be severe:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: rcu: 1-...!: (0 ticks this GP) idle=14f/1/0x4000000000000000 softirq=3397/3397 fqs=0 rcu: 3-...!: (0 ticks this GP) idle=ee9/1/0x4000000000000000 softirq=5168/5168 fqs=0 (detected by 0, t=8137 jiffies, g=5889, q=2 ncpus=4) Task dump for CPU 1: task:migration/1 state:R running task stack: 0 pid: 19 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace: Task dump for CPU 3: task:migration/3 state:R running task stack: 0 pid: 29 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace: rcu: rcu_preempt kthread timer wakeup didn't happen for 8136 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 rcu: Possible timer handling issue on cpu=2 timer-softirq=594 rcu: rcu_preempt kthread starved for 8137 jiffies! g5889 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:I stack: 0 pid: 14 ppid: 2 flags:0x00000000 Call Trace: schedule+0x56/0xc2 schedule_timeout+0x82/0x184 rcu_gp_fqs_loop+0x19a/0x318 rcu_gp_kthread+0x11a/0x140 kthread+0xee/0x118 ret_from_exception+0x0/0x14 rcu: Stack dump where RCU GP kthread last ran: Task dump for CPU 2: task:migration/2 state:R running task stack: 0 pid: 24 ppid: 2 flags:0x00000000 Stopper: multi_cpu_stop+0x0/0x18c <- stop_machine_cpuslocked+0x128/0x174 Call Trace:
This commit therefore marks these functions notrace: rcu_preempt_deferred_qs() rcu_preempt_need_deferred_qs() rcu_preempt_deferred_qs_irqrestore()
[ paulmck: Apply feedback from Neeraj Upadhyay. ]
Signed-off-by: Patrick Wang <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]>
show more ...
|