|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5 |
|
| #
5a562b8b |
| 27-Feb-2025 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu: Use _full() API to debug synchronize_rcu()
Switch for using of get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() pair to debug a normal synchronize_rcu() call.
Just using
rcu: Use _full() API to debug synchronize_rcu()
Switch for using of get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() pair to debug a normal synchronize_rcu() call.
Just using "not" full APIs to identify if a grace period is passed or not might lead to a false-positive kernel splat.
It can happen, because get_state_synchronize_rcu() compresses both normal and expedited states into one single unsigned long value, so a poll_state_synchronize_rcu() can miss GP-completion when synchronize_rcu()/synchronize_rcu_expedited() concurrently run.
To address this, switch to poll_state_synchronize_rcu_full() and get_state_synchronize_rcu_full() APIs, which use separate variables for expedited and normal states.
Reported-by: cheung wall <[email protected]> Closes: https://lore.kernel.org/lkml/Z5ikQeVmVdsWQrdD@pc636/T/ Fixes: 988f569ae041 ("rcu: Reduce synchronize_rcu() latency") Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3 |
|
| #
85aad7cc |
| 12-Dec-2024 |
Paul E. McKenney <[email protected]> |
rcu: Fix get_state_synchronize_rcu_full() GP-start detection
The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions use the root rcu_node structure's ->gp_seq field to
rcu: Fix get_state_synchronize_rcu_full() GP-start detection
The get_state_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions use the root rcu_node structure's ->gp_seq field to detect the beginnings and ends of grace periods, respectively. This choice is necessary for the poll_state_synchronize_rcu_full() function because (give or take counter wrap), the following sequence is guaranteed not to trigger:
get_state_synchronize_rcu_full(&rgos); synchronize_rcu(); WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&rgos));
The RCU callbacks that awaken synchronize_rcu() instances are guaranteed not to be invoked before the root rcu_node structure's ->gp_seq field is updated to indicate the end of the grace period. However, these callbacks might start being invoked immediately thereafter, in particular, before rcu_state.gp_seq has been updated. Therefore, poll_state_synchronize_rcu_full() must refer to the root rcu_node structure's ->gp_seq field. Because this field is updated under this structure's ->lock, any code following a call to poll_state_synchronize_rcu_full() will be fully ordered after the full grace-period computation, as is required by RCU's memory-ordering semantics.
By symmetry, the get_state_synchronize_rcu_full() function should also use this same root rcu_node structure's ->gp_seq field. But it turns out that symmetry is profoundly (though extremely infrequently) destructive in this case. To see this, consider the following sequence of events:
1. CPU 0 starts a new grace period, and updates rcu_state.gp_seq accordingly.
2. As its first step of grace-period initialization, CPU 0 examines the current CPU hotplug state and decides that it need not wait for CPU 1, which is currently offline.
3. CPU 1 comes online, and updates its state. But this does not affect the current grace period, but rather the one after that. After all, CPU 1 was offline when the current grace period started, so all pre-existing RCU readers on CPU 1 must have completed or been preempted before it last went offline. The current grace period therefore has nothing it needs to wait for on CPU 1.
4. CPU 1 switches to an rcutorture kthread which is running rcutorture's rcu_torture_reader() function, which starts a new RCU reader.
5. CPU 2 is running rcutorture's rcu_torture_writer() function and collects a new polled grace-period "cookie" using get_state_synchronize_rcu_full(). Because the newly started grace period has not completed initialization, the root rcu_node structure's ->gp_seq field has not yet been updated to indicate that this new grace period has already started.
This cookie is therefore set up for the end of the current grace period (rather than the end of the following grace period).
6. CPU 0 finishes grace-period initialization.
7. If CPU 1’s rcutorture reader is preempted, it will be added to the ->blkd_tasks list, but because CPU 1’s ->qsmask bit is not set in CPU 1's leaf rcu_node structure, the ->gp_tasks pointer will not be updated. Thus, this grace period will not wait on it. Which is only fair, given that the CPU did not come online until after the grace period officially started.
8. CPUs 0 and 2 then detect the new grace period and then report a quiescent state to the RCU core.
9. Because CPU 1 was offline at the start of the current grace period, CPUs 0 and 2 are the only CPUs that this grace period needs to wait on. So the grace period ends and post-grace-period cleanup starts. In particular, the root rcu_node structure's ->gp_seq field is updated to indicate that this grace period has now ended.
10. CPU 2 continues running rcu_torture_writer() and sees that, from the viewpoint of the root rcu_node structure consulted by the poll_state_synchronize_rcu_full() function, the grace period has ended. It therefore updates state accordingly.
11. CPU 1 is still running the same RCU reader, which notices this update and thus complains about the too-short grace period.
The fix is for the get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field. With this change in place, if step 5's cookie indicates that the grace period has not yet started, then any prior code executed by CPU 2 must have happened before CPU 1 came online. This will in turn prevent CPU 1's code in steps 3 and 11 from spanning CPU 2's grace-period wait, thus preventing CPU 1 from being subjected to a too-short grace period.
This commit therefore makes this change. Note that there is no change to the poll_state_synchronize_rcu_full() function, which as noted above, must continue to use the root rcu_node structure's ->gp_seq field. This is of course an asymmetry between these two functions, but is an asymmetry that is absolutely required for correct operation. It is a common human tendency to greatly value symmetry, and sometimes symmetry is a wonderful thing. Other times, symmetry results in poor performance. But in this case, symmetry is just plain wrong.
Nevertheless, the asymmetry does require an additional adjustment. It is possible for get_state_synchronize_rcu_full() to see a given grace period as having started, but for an immediately following poll_state_synchronize_rcu_full() to see it as having not yet started. Given the current rcu_seq_done_exact() implementation, this will result in a false-positive indication that the grace period is done from poll_state_synchronize_rcu_full(). This is dealt with by making rcu_seq_done_exact() reach back three grace periods rather than just two of them.
However, simply changing get_state_synchronize_rcu_full() function to use rcu_state.gp_seq instead of the root rcu_node structure's ->gp_seq field results in a theoretical bug in kernels booted with rcutree.rcu_normal_wake_from_gp=1 due to the following sequence of events:
o The rcu_gp_init() function invokes rcu_seq_start() to officially start a new grace period.
o A new RCU reader begins, referencing X from some RCU-protected list. The new grace period is not obligated to wait for this reader.
o An updater removes X, then calls synchronize_rcu(), which queues a wait element.
o The grace period ends, awakening the updater, which frees X while the reader is still referencing it.
The reason that this is theoretical is that although the grace period has officially started, none of the CPUs are officially aware of this, and thus will have to assume that the RCU reader pre-dated the start of the grace period. Detailed explanation can be found at [2] and [3].
Except for kernels built with CONFIG_PROVE_RCU=y, which use the polled grace-period APIs, which can and do complain bitterly when this sequence of events occurs. Not only that, there might be some future RCU grace-period mechanism that pulls this sequence of events from theory into practice. This commit therefore also pulls the call to rcu_sr_normal_gp_init() to precede that to rcu_seq_start().
Although this fixes commit 91a967fd6934 ("rcu: Add full-sized polling for get_completed*() and poll_state*()"), it is not clear that it is worth backporting this commit. First, it took me many weeks to convince rcutorture to reproduce this more frequently than once per year. Second, this cannot be reproduced at all without frequent CPU-hotplug operations, as in waiting all of 50 milliseconds from the end of the previous operation until starting the next one. Third, the TREE03.boot settings cause multi-millisecond delays during RCU grace-period initialization, which greatly increase the probability of the above sequence of events. (Don't do this in production workloads!) Fourth, the TREE03 rcutorture scenario was modified to use four-CPU guest OSes, to have a single-rcu_node combining tree, no testing of RCU priority boosting, and no random preemption, and these modifications were necessary to reproduce this issue in a reasonable timeframe. Fifth, extremely heavy use of get_state_synchronize_rcu_full() and/or poll_state_synchronize_rcu_full() is required to reproduce this, and as of v6.12, only kfree_rcu() uses it, and even then not particularly heavily.
[boqun: Apply the fix [1], and add the comment before the moved rcu_sr_normal_gp_init(). Additional links are added for explanation.]
Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Tested-by: Uladzislau Rezki (Sony) <[email protected]> Link: https://lore.kernel.org/rcu/d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop/ [1] Link: https://lore.kernel.org/rcu/20250303001507.GA3994772@joelnvbox/ [2] Link: https://lore.kernel.org/rcu/Z8bcUsZ9IpRi1QoP@pc636/ [3] Reviewed-by: Joel Fernandes <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
7acc2d90 |
| 18-Dec-2024 |
Paul E. McKenney <[email protected]> |
rcutorture: Make cur_ops->format_gp_seqs take buffer length
The Tree and Tiny implementations of rcutorture_format_gp_seqs() use hard-coded constants for the length of the buffer that they format in
rcutorture: Make cur_ops->format_gp_seqs take buffer length
The Tree and Tiny implementations of rcutorture_format_gp_seqs() use hard-coded constants for the length of the buffer that they format into. This is of course an accident waiting to happen, so this commit therefore makes them take a length argument. The rcutorture calling code uses ARRAY_SIZE() to safely compute this new argument.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc2 |
|
| #
2db7ab8c |
| 03-Dec-2024 |
Paul E. McKenney <[email protected]> |
rcutorture: Expand failure/close-call grace-period output
With only eight bits per grace-period sequence number, wrap can happen in 64 grace periods. This commit therefore increases this to sixteen
rcutorture: Expand failure/close-call grace-period output
With only eight bits per grace-period sequence number, wrap can happen in 64 grace periods. This commit therefore increases this to sixteen bits for normal grace-period sequence numbers and the combined short-form polling sequence numbers, thus deferring wrap for at least 16,384 grace periods. Because expedited grace periods go faster, expand these to 24 bits, deferring wrap for at least 4,194,304 expedited grace periods. These longer wrap times makes it easier to correlate these numbers to trace-event output.
Note that the low-order two bits are reserved for intra-grace-period state, hence the above wrap numbers being a factor of four smaller than you might expect.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
|
Revision tags: v6.13-rc1, v6.12 |
|
| #
84ae9101 |
| 14-Nov-2024 |
Paul E. McKenney <[email protected]> |
rcutorture: Include grace-period sequence numbers in failure/close-call
This commit includes the grace-period sequence numbers at the beginning and end of each segment in the "Failure/close-call rcu
rcutorture: Include grace-period sequence numbers in failure/close-call
This commit includes the grace-period sequence numbers at the beginning and end of each segment in the "Failure/close-call rcutorture reader segments" list. These are in hexadecimal, and only the bottom byte. Currently, only RCU is supported, with its three sequence numbers (normal, expedited, and polled).
Note that if all the grace-period sequence numbers remain the same across a given reader segment, only one copy of the number will be printed. Of course, if there is a change, both sets of values will be printed.
Because the overhead of collecting this information can suppress heisenbugs, this information is collected and printed only in kernels built with CONFIG_RCU_TORTURE_TEST_LOG_GP=y.
[ paulmck: Apply Nathan Chancellor feedback for IS_ENABLED(). ] [ paulmck: Apply feedback from kernel test robot. ]
Signed-off-by: Paul E. McKenney <[email protected]> Tested-by: kernel test robot <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
7f4b19ef |
| 03-Feb-2025 |
Vlastimil Babka <[email protected]> |
rcu: remove trace_rcu_kvfree_callback
Tree RCU does not handle kvfree_rcu() by queueing individual objects by call_rcu() anymore, thus the tracepoint and associated __is_kvfree_rcu_offset() check is
rcu: remove trace_rcu_kvfree_callback
Tree RCU does not handle kvfree_rcu() by queueing individual objects by call_rcu() anymore, thus the tracepoint and associated __is_kvfree_rcu_offset() check is dead code now. Remove it.
Reviewed-by: Joel Fernandes (Google) <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc7 |
|
| #
764f6a81 |
| 10-Nov-2024 |
Zilin Guan <[email protected]> |
rcu: Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
There is one access to the per-CPU rdp->gpwrap field in the __note_gp_changes() function that does not use READ_ONCE(), but all
rcu: Remove READ_ONCE() for rdp->gpwrap access in __note_gp_changes()
There is one access to the per-CPU rdp->gpwrap field in the __note_gp_changes() function that does not use READ_ONCE(), but all other accesses do use READ_ONCE(). When using the 8*TREE03 and CONFIG_NR_CPUS=8 configuration, KCSAN found no data races at that point. This is because all calls to __note_gp_changes() hold rnp->lock, which excludes writes to the rdp->gpwrap fields for all CPUs associated with that same leaf rcu_node structure.
This commit therefore removes READ_ONCE() from rdp->gpwrap accesses within the __note_gp_changes() function.
Signed-off-by: Zilin Guan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
053ca725 |
| 09-Jan-2025 |
Paul E. McKenney <[email protected]> |
rcu: Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
This commit adds a description of the energy-efficiency delays that call_rcu() can impose, along with a pointer to call_rcu_hurry() fo
rcu: Add CONFIG_RCU_LAZY delays to call_rcu() kernel-doc header
This commit adds a description of the energy-efficiency delays that call_rcu() can impose, along with a pointer to call_rcu_hurry() for latency-sensitive kernel code.
Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
21ef2498 |
| 09-Jan-2025 |
Paul E. McKenney <[email protected]> |
rcu: Document self-propagating callbacks
This commit documents the fact that a given RCU callback function can repost itself.
Reported-by: Jens Axboe <[email protected]> Signed-off-by: Paul E. McKenn
rcu: Document self-propagating callbacks
This commit documents the fact that a given RCU callback function can repost itself.
Reported-by: Jens Axboe <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Boqun Feng <[email protected]>
show more ...
|
| #
d40797d6 |
| 22-Nov-2024 |
Peter Zijlstra <[email protected]> |
kasan: make kasan_record_aux_stack_noalloc() the default behaviour
kasan_record_aux_stack_noalloc() was introduced to record a stack trace without allocating memory in the process. It has been adde
kasan: make kasan_record_aux_stack_noalloc() the default behaviour
kasan_record_aux_stack_noalloc() was introduced to record a stack trace without allocating memory in the process. It has been added to callers which were invoked while a raw_spinlock_t was held. More and more callers were identified and changed over time. Is it a good thing to have this while functions try their best to do a locklessly setup? The only downside of having kasan_record_aux_stack() not allocate any memory is that we end up without a stacktrace if stackdepot runs out of memory and at the same stacktrace was not recorded before To quote Marco Elver from https://lore.kernel.org/all/CANpmjNPmQYJ7pv1N3cuU8cP18u7PP_uoZD8YxwZd4jtbof9nVQ@mail.gmail.com/
| I'd be in favor, it simplifies things. And stack depot should be | able to replenish its pool sufficiently in the "non-aux" cases | i.e. regular allocations. Worst case we fail to record some | aux stacks, but I think that's only really bad if there's a bug | around one of these allocations. In general the probabilities | of this being a regression are extremely small [...]
Make the kasan_record_aux_stack_noalloc() behaviour default as kasan_record_aux_stack().
[[email protected]: dressed the diff as patch] Link: https://lkml.kernel.org/r/[email protected] Fixes: 7cb3007ce2da ("kasan: generic: introduce kasan_record_aux_stack_noalloc()") Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected] Reviewed-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Reviewed-by: Waiman Long <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Ben Segall <[email protected]> Cc: Boqun Feng <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Hyeonggon Yoo <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jann Horn <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Juri Lelli <[email protected]> Cc: <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Liam R. Howlett <[email protected]> Cc: Lorenzo Stoakes <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Neeraj Upadhyay <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Zqiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
bbe658d6 |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
mm/slab: Move kvfree_rcu() into SLAB
Move kvfree_rcu() functionality to the slab_common.c file.
The reason to have kvfree_rcu() functionality as part of SLAB is that there is a clear trend and need
mm/slab: Move kvfree_rcu() into SLAB
Move kvfree_rcu() functionality to the slab_common.c file.
The reason to have kvfree_rcu() functionality as part of SLAB is that there is a clear trend and need of closer integration. One of the recent example is creating a barrier function for SLAB caches.
Another reason is to prevent of having several implementations of RCU machinery for reclaiming objects after a GP. As future steps, it can be more integrated(easier) with SLAB internals.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
c18bcd85 |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Adjust a shrinker name
Rename "rcu-kfree" to "slab-kvfree-rcu" since it goes to the slab_common.c file soon.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggo
rcu/kvfree: Adjust a shrinker name
Rename "rcu-kfree" to "slab-kvfree-rcu" since it goes to the slab_common.c file soon.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
ba5cac52 |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Adjust names passed into trace functions
Currently trace functions are supplied with "rcu_state.name" member which is located in the structure. The problem is that the "rcu_state" struct
rcu/kvfree: Adjust names passed into trace functions
Currently trace functions are supplied with "rcu_state.name" member which is located in the structure. The problem is that the "rcu_state" structure variable is local and can not be accessed from another place.
To address this, this preparation patch passes "slab" string as a first argument.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
d824ed70 |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Move some functions under CONFIG_TINY_RCU
Currently when a tiny RCU is enabled, the tree.c file is not compiled, thus duplicating function names do not conflict with each other.
Because
rcu/kvfree: Move some functions under CONFIG_TINY_RCU
Currently when a tiny RCU is enabled, the tree.c file is not compiled, thus duplicating function names do not conflict with each other.
Because of moving of kvfree_rcu() functionality to the SLAB, we have to reorder some functions and place them together under CONFIG_TINY_RCU macro definition. Therefore, those functions name will not conflict when a kernel is compiled for CONFIG_TINY_RCU flavor.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
| #
0f52b4db |
| 12-Dec-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Initialize kvfree_rcu() separately
Introduce a separate initialization of kvfree_rcu() functionality. For such purpose a kfree_rcu_batch_init() is renamed to a kvfree_rcu_init() and it i
rcu/kvfree: Initialize kvfree_rcu() separately
Introduce a separate initialization of kvfree_rcu() functionality. For such purpose a kfree_rcu_batch_init() is renamed to a kvfree_rcu_init() and it is invoked from the main.c right after rcu_init() is done.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Acked-by: Hyeonggon Yoo <[email protected]> Tested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1 |
|
| #
8044c589 |
| 26-Sep-2024 |
Frederic Weisbecker <[email protected]> |
rcu: Use kthread preferred affinity for RCU exp kworkers
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU exp workers t
rcu: Use kthread preferred affinity for RCU exp kworkers
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU exp workers to use it instead of handling all the constraints by itself.
Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
b04e317b |
| 26-Sep-2024 |
Frederic Weisbecker <[email protected]> |
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it.
On the other hand, kthread_create_worker()
treewide: Introduce kthread_run_worker[_on_cpu]()
kthread_create() creates a kthread without running it yet. kthread_run() creates a kthread and runs it.
On the other hand, kthread_create_worker() creates a kthread worker and runs it.
This difference in behaviours is confusing. Also there is no way to create a kthread worker and affine it using kthread_bind_mask() or kthread_affine_preferred() before starting it.
Consolidate the behaviours and introduce kthread_run_worker[_on_cpu]() that behaves just like kthread_run(). kthread_create_worker[_on_cpu]() will now only create a kthread worker without starting it.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Dan Carpenter <[email protected]>
show more ...
|
| #
db7ee3cb |
| 26-Sep-2024 |
Frederic Weisbecker <[email protected]> |
rcu: Use kthread preferred affinity for RCU boost
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU boost to use it inst
rcu: Use kthread preferred affinity for RCU boost
Now that kthreads have an infrastructure to handle preferred affinity against CPU hotplug and housekeeping cpumask, convert RCU boost to use it instead of handling all the constraints by itself.
Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
049dfe96 |
| 02-Oct-2024 |
Frederic Weisbecker <[email protected]> |
rcu: Report callbacks enqueued on offline CPU blind spot
Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier blind spot. Report any potential misuse.
Reported-by: Paul E. McKen
rcu: Report callbacks enqueued on offline CPU blind spot
Callbacks enqueued after rcutree_report_cpu_dead() fall into RCU barrier blind spot. Report any potential misuse.
Reported-by: Paul E. McKenney <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
show more ...
|
| #
a23da88c |
| 22-Oct-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
KCSAN reports a data race when access the krcp->monitor_work.timer.expires variable in the schedule_delayed_monitor_work() function:
<snip
rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
KCSAN reports a data race when access the krcp->monitor_work.timer.expires variable in the schedule_delayed_monitor_work() function:
<snip> BUG: KCSAN: data-race in __mod_timer / kvfree_call_rcu
read to 0xffff888237d1cce8 of 8 bytes by task 10149 on cpu 1: schedule_delayed_monitor_work kernel/rcu/tree.c:3520 [inline] kvfree_call_rcu+0x3b8/0x510 kernel/rcu/tree.c:3839 trie_update_elem+0x47c/0x620 kernel/bpf/lpm_trie.c:441 bpf_map_update_value+0x324/0x350 kernel/bpf/syscall.c:203 generic_map_update_batch+0x401/0x520 kernel/bpf/syscall.c:1849 bpf_map_do_batch+0x28c/0x3f0 kernel/bpf/syscall.c:5143 __sys_bpf+0x2e5/0x7a0 __do_sys_bpf kernel/bpf/syscall.c:5741 [inline] __se_sys_bpf kernel/bpf/syscall.c:5739 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5739 x64_sys_call+0x2625/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:322 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
write to 0xffff888237d1cce8 of 8 bytes by task 56 on cpu 0: __mod_timer+0x578/0x7f0 kernel/time/timer.c:1173 add_timer_global+0x51/0x70 kernel/time/timer.c:1330 __queue_delayed_work+0x127/0x1a0 kernel/workqueue.c:2523 queue_delayed_work_on+0xdf/0x190 kernel/workqueue.c:2552 queue_delayed_work include/linux/workqueue.h:677 [inline] schedule_delayed_monitor_work kernel/rcu/tree.c:3525 [inline] kfree_rcu_monitor+0x5e8/0x660 kernel/rcu/tree.c:3643 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3310 worker_thread+0x51d/0x6f0 kernel/workqueue.c:3391 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 56 Comm: kworker/u8:4 Not tainted 6.12.0-rc2-syzkaller-00050-g5b7c893ed5ed #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events_unbound kfree_rcu_monitor <snip>
kfree_rcu_monitor() rearms the work if a "krcp" has to be still offloaded and this is done without holding krcp->lock, whereas the kvfree_call_rcu() holds it.
Fix it by acquiring the "krcp->lock" for kfree_rcu_monitor() so both functions do not race anymore.
Reported-by: [email protected] Link: https://lore.kernel.org/lkml/ZxZ68KmHDQYU0yfD@pc636/T/ Fixes: 8fc5494ad5fa ("rcu/kvfree: Move need_offload_krc() out of krcp->lock") Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
a3076380 |
| 09-Oct-2024 |
Paul E. McKenney <[email protected]> |
rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled
The header comment for both start_poll_synchronize_rcu() and start_poll_synchronize_rcu_full() state that interrupts must be enable
rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled
The header comment for both start_poll_synchronize_rcu() and start_poll_synchronize_rcu_full() state that interrupts must be enabled when calling these two functions, and there is a lockdep assertion in start_poll_synchronize_rcu_common() enforcing this restriction. However, there is no need for this restrictions, as can be seen in call_rcu(), which does wakeups when interrupts are disabled.
This commit therefore removes the lockdep assertion and the comments.
Reported-by: Kent Overstreet <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]> Reviewed-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
|
Revision tags: v6.11, v6.11-rc7 |
|
| #
5d2501f4 |
| 02-Sep-2024 |
Jinjie Ruan <[email protected]> |
rcu: Use the BITS_PER_LONG macro
sizeof(unsigned long) * 8 is the number of bits in an unsigned long variable, replace it with BITS_PER_LONG macro to make it simpler.
Signed-off-by: Jinjie Ruan <ru
rcu: Use the BITS_PER_LONG macro
sizeof(unsigned long) * 8 is the number of bits in an unsigned long variable, replace it with BITS_PER_LONG macro to make it simpler.
Signed-off-by: Jinjie Ruan <[email protected]> Reviewed-by: "Paul E. McKenney" <[email protected]> Signed-off-by: Neeraj Upadhyay <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
show more ...
|
| #
3c5d61ae |
| 30-Sep-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Refactor kvfree_rcu_queue_batch()
Improve readability of kvfree_rcu_queue_batch() function in away that, after a first batch queuing, the loop is break and success value is returned to a
rcu/kvfree: Refactor kvfree_rcu_queue_batch()
Improve readability of kvfree_rcu_queue_batch() function in away that, after a first batch queuing, the loop is break and success value is returned to a caller.
There is no reason to loop and check batches further as all outstanding objects have already been picked and attached to a certain batch to complete an offloading.
Fixes: 2b55d6a42d14 ("rcu/kvfree: Add kvfree_rcu_barrier() API") Suggested-by: Linus Torvalds <[email protected]> Closes: https://lore.kernel.org/lkml/ZvWUt2oyXRsvJRNc@pc636/T/ Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5 |
|
| #
2b55d6a4 |
| 20-Aug-2024 |
Uladzislau Rezki (Sony) <[email protected]> |
rcu/kvfree: Add kvfree_rcu_barrier() API
Add a kvfree_rcu_barrier() function. It waits until all in-flight pointers are freed over RCU machinery. It does not wait any GP completion and it is within
rcu/kvfree: Add kvfree_rcu_barrier() API
Add a kvfree_rcu_barrier() function. It waits until all in-flight pointers are freed over RCU machinery. It does not wait any GP completion and it is within its right to return immediately if there are no outstanding pointers.
This function is useful when there is a need to guarantee that a memory is fully freed before destroying memory caches. For example, during unloading a kernel module.
Signed-off-by: Uladzislau Rezki (Sony) <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc4 |
|
| #
e68ac2b4 |
| 15-Aug-2024 |
Caleb Sander Mateos <[email protected]> |
softirq: Remove unused 'action' parameter from action callback
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function poi
softirq: Remove unused 'action' parameter from action callback
When soft interrupt actions are called, they are passed a pointer to the struct softirq action which contains the action's function pointer.
This pointer isn't useful, as the action callback already knows what function it is. And since each callback handles a specific soft interrupt, the callback also knows which soft interrupt number is running.
No soft interrupt action callback actually uses this parameter, so remove it from the function pointer signature. This clarifies that soft interrupt actions are global routines and makes it slightly cheaper to call them.
Signed-off-by: Caleb Sander Mateos <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|