|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6 |
|
| #
3b1596a2 |
| 29-Oct-2024 |
Frederic Weisbecker <[email protected]> |
clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING
The way the clockevent devices are finally stopped while a CPU is offlining is currently chaotic. The layout being by
clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING
The way the clockevent devices are finally stopped while a CPU is offlining is currently chaotic. The layout being by order:
1) tick_sched_timer_dying() stops the tick and the underlying clockevent but only for oneshot case. The periodic tick and its related clockevent still runs.
2) tick_broadcast_offline() detaches and stops the per-cpu oneshot broadcast and append it to the released list.
3) Some individual clockevent drivers stop the clockevents (a second time if the tick is oneshot)
4) Once the CPU is dead, a control CPU remotely detaches and stops (a 3rd time if oneshot mode) the CPU clockevent and adds it to the released list.
5) The released list containing the broadcast device released on step 2) and the remotely detached clockevent from step 4) are unregistered.
These random events can be factorized if the current clockevent is detached and stopped by the dying CPU at the generic layer, that is from the dying CPU:
a) Stop the tick b) Stop/detach the underlying per-cpu oneshot broadcast clockevent c) Stop/detach the underlying clockevent d) Release / unregister the clockevents from b) and c) e) Release / unregister the remaining clockevents from the dying CPU. This part could be performed by the dying CPU
This way the drivers and the tick layer don't need to care about clockevent operations during cpuhotplug down. This also unifies the tick behaviour on offline CPUs between oneshot and periodic modes, avoiding offline ticks altogether for sanity.
Adopt the simplification.
[ tglx: Remove the WARN_ON() in clockevents_register_device() as that is called from an upcoming CPU before the CPU is marked online ]
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc5 |
|
| #
b5413156 |
| 26-Oct-2024 |
Benjamin Segall <[email protected]> |
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear th
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
When cloning a new thread, its posix_cputimers are not inherited, and are cleared by posix_cputimers_init(). However, this does not clear the tick dependency it creates in tsk->tick_dep_mask, and the handler does not reach the code to clear the dependency if there were no timers to begin with.
Thus if a thread has a cputimer running before clone/fork, all descendants will prevent nohz_full unless they create a cputimer of their own.
Fix this by entirely clearing the tick_dep_mask in copy_process(). (There is currently no inherited state that needs a tick dependency)
Process-wide timers do not have this problem because fork does not copy signal_struct as a baseline, it creates one from scratch.
Fixes: b78783000d5c ("posix-cpu-timers: Migrate to use new tick dependency mask model") Signed-off-by: Ben Segall <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5 |
|
| #
9403408e |
| 17-Jun-2024 |
Christian Loehle <[email protected]> |
tick: Remove unnused tick_nohz_get_idle_calls()
The function returns the idle calls counter for the current cpu and therefore usually isn't what the caller wants. It is unnused since commit 466a2b42
tick: Remove unnused tick_nohz_get_idle_calls()
The function returns the idle calls counter for the current cpu and therefore usually isn't what the caller wants. It is unnused since commit 466a2b42d676 ("cpufreq: schedutil: Use idle_calls counter of the remote CPU")
Signed-off-by: Christian Loehle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7 |
|
| #
2be2a197 |
| 29-Feb-2024 |
Thomas Gleixner <[email protected]> |
sched/idle: Conditionally handle tick broadcast in default_idle_call()
The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC ti
sched/idle: Conditionally handle tick broadcast in default_idle_call()
The x86 architecture has an idle routine for AMD CPUs which are affected by erratum 400. On the affected CPUs the local APIC timer stops in the C1E halt state.
It therefore requires tick broadcasting. The invocation of tick_broadcast_enter()/exit() from this function violates the RCU constraints because it can end up in lockdep or tracing, which rightfully triggers a warning.
tick_broadcast_enter()/exit() must be invoked before ct_cpuidle_enter() and after ct_cpuidle_exit() in default_idle_call().
Add a static branch conditional invocation of tick_broadcast_enter()/exit() into this function to allow X86 to replace the AMD specific idle code. It's guarded by a config switch which will be selected by x86. Otherwise it's a NOOP.
Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc6 |
|
| #
500f8f9b |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after.
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle.
Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state.
Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ef8969bb |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU.
However the tick broadc
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU.
However the tick broadcast is a midware between the tick callback and the clocksource, therefore it makes more sense to shut it down after the tick callback and before the clocksource drivers.
Move it instead to the common tick shutdown CPU hotplug state where related operations can be ordered from highest to lowest level.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3ad6eb06 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks on stop machine, sometimes through explicit calls, sometimes from the control CPU after the CPU died. The reason why these shutdown operations are spread around is not always clear and it makes the tick lifecycle hard to follow.
The tick should be shut down in order from highest to lowest level:
On stop machine from the dying CPU (high-level):
1) Hand-over the timekeeping duty (tick_handover_do_timer()) 2) Cancel the tick implementation called by the clockevent callback (tick_cancel_sched_timer()) 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())
On stop machine from the dying CPU (low-level):
4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states)
From the control CPU after the CPU died (low-level):
5) Shutdown/unregister/cleanup clockevents for the dead CPU (tick_cleanup_dead_cpu())
Instead the current order is 2, 4 (both from CPU hotplug states), then 1 and 3 through direct calls. This layout and order don't make much sense. The operations 1, 2, 3 should be gathered together and in order.
Sort this situation with creating a new TICK shut-down CPU hotplug state and start with introducing the timekeeping duty hand-over there. The state must precede hrtimers migration because the tick hrtimer will be stopped from it in a further patch.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc5 |
|
| #
31a5c0b7 |
| 13-Feb-2024 |
James Morse <[email protected]> |
tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
tick_nohz_full_mask lists the CPUs that are nohz_full. This is only needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu()
tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
tick_nohz_full_mask lists the CPUs that are nohz_full. This is only needed when CONFIG_NO_HZ_FULL is defined. tick_nohz_full_cpu() allows a specific CPU to be tested against the mask, and evaluates to false when CONFIG_NO_HZ_FULL is not defined.
The resctrl code needs to pick a CPU to run some work on, a new helper prefers housekeeping CPUs by examining the tick_nohz_full_mask. Hiding the declaration behind #ifdef CONFIG_NO_HZ_FULL forces all the users to be behind an #ifdef too.
Move the tick_nohz_full_mask declaration, this lets callers drop the #ifdef, and guard access to tick_nohz_full_mask with IS_ENABLED() or something like tick_nohz_full_cpu().
The definition does not need to be moved as any callers should be removed at compile time unless CONFIG_NO_HZ_FULL is defined.
Signed-off-by: James Morse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Shaopeng Tan <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Reinette Chatre <[email protected]> # for resctrl dependency Tested-by: Shaopeng Tan <[email protected]> Tested-by: Peter Newman <[email protected]> Tested-by: Carl Worth <[email protected]> # arm64 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Borislav Petkov (AMD) <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2 |
|
| #
c02a427f |
| 12-Sep-2023 |
Xueshi Hu <[email protected]> |
tick/nohz: Remove unused tick_nohz_idle_stop_tick_protected()
All the caller has been removed since commit 336f560a8917 ("x86/xen: don't let xen_pv_play_dead() return")
Signed-off-by: Xueshi Hu <xu
tick/nohz: Remove unused tick_nohz_idle_stop_tick_protected()
All the caller has been removed since commit 336f560a8917 ("x86/xen: don't let xen_pv_play_dead() return")
Signed-off-by: Xueshi Hu <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6 |
|
| #
58d76682 |
| 24-Jan-2023 |
Joel Fernandes (Google) <[email protected]> |
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for th
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined. However, cpu_is_hotpluggable() still returns true for those CPUs. This causes torture tests that do offlining to end up trying to offline this CPU causing test failures. Such failure happens on all architectures.
Fix the repeated error messages thrown by this (even if the hotplug errors are harmless) by asking the opinion of the nohz subsystem on whether the CPU can be hotplugged.
[ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ]
For drivers/base/ portion: Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Zhouyi Zhou <[email protected]> Cc: Will Deacon <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: rcu <[email protected]> Cc: [email protected] Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel") Signed-off-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4 |
|
| #
f268c373 |
| 27-May-2021 |
Frederic Weisbecker <[email protected]> |
tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed
Checking for and processing RCU-nocb deferred wakeup upon user/guest entry is only relevant when nohz_full runs on the l
tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed
Checking for and processing RCU-nocb deferred wakeup upon user/guest entry is only relevant when nohz_full runs on the local CPU, otherwise the periodic tick should take care of it.
Make sure we don't needlessly pollute these fast-paths as a -3% performance regression on a will-it-scale.per_process_ops has been reported so far.
Fixes: 47b8ff194c1f (entry: Explicitly flush pending rcuog wakeup before last rescheduling point) Fixes: 4ae7dc97f726 (entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point) Reported-by: kernel test robot <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.13-rc3, v5.13-rc2 |
|
| #
1e4ca26d |
| 12-May-2021 |
Marcelo Tosatti <[email protected]> |
tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
Rather than waking up all nohz_full CPUs on the system, only wake up the target CPUs of member threads of the signal.
Reduce
tick/nohz: Change signal tick dependency to wake up CPUs of member tasks
Rather than waking up all nohz_full CPUs on the system, only wake up the target CPUs of member threads of the signal.
Reduces interruptions to nohz_full CPUs.
Signed-off-by: Marcelo Tosatti <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f105dfec |
| 12-May-2021 |
Peter Zijlstra <[email protected]> |
tick/nohz: Evaluate the CPU expression after the static key
When tick_nohz_full_cpu() is called with smp_processor_id(), the latter is unconditionally evaluated whether the static key is on or off.
tick/nohz: Evaluate the CPU expression after the static key
When tick_nohz_full_cpu() is called with smp_processor_id(), the latter is unconditionally evaluated whether the static key is on or off. It is not necessary in the off-case though, so make sure the cpu expression is executed at the last moment.
Illustrate with the following test function:
int tick_nohz_test(void) { return tick_nohz_full_cpu(smp_processor_id()); }
The resulting code before was:
mov %gs:0x7eea92d1(%rip),%eax # smp_processor_id() fetch nopl 0x0(%rax,%rax,1) xor %eax,%eax retq cmpb $0x0,0x29d393a(%rip) # <tick_nohz_full_running> je tick_nohz_test+0x29 # jump to below eax clear mov %eax,%eax bt %rax,0x29d3936(%rip) # <tick_nohz_full_mask> setb %al movzbl %al,%eax retq xor %eax,%eax retq
Now it becomes:
nopl 0x0(%rax,%rax,1) xor %eax,%eax retq cmpb $0x0,0x29d3871(%rip) # <tick_nohz_full_running> je tick_nohz_test+0x29 # jump to below eax clear mov %gs:0x7eea91f0(%rip),%eax # smp_processor_id() fetch, after static key mov %eax,%eax bt %rax,0x29d3866(%rip) # <tick_nohz_full_mask> setb %al movzbl %al,%eax retq xor %eax,%eax retq
Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7, v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1, v5.5, v5.5-rc7, v5.5-rc6, v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1 |
|
| #
df1e849a |
| 28-Nov-2019 |
Paul E. McKenney <[email protected]> |
rcu: Enable tick for nohz_full CPUs slow to provide expedited QS
An expedited grace period can be stalled by a nohz_full CPU looping in kernel context. This possibility is currently handled by some
rcu: Enable tick for nohz_full CPUs slow to provide expedited QS
An expedited grace period can be stalled by a nohz_full CPU looping in kernel context. This possibility is currently handled by some carefully crafted checks in rcu_read_unlock_special() that enlist help from ksoftirqd when permitted by the scheduler. However, it is exactly these checks that require the scheduler avoid holding any of its rq or pi locks across rcu_read_unlock() without also having held them across the entire RCU read-side critical section.
It would therefore be very nice if expedited grace periods could handle nohz_full CPUs looping in kernel context without such checks. This commit therefore adds code to the expedited grace period's wait and cleanup code that forces the scheduler-clock interrupt on for CPUs that fail to quickly supply a quiescent state. "Quickly" is currently a hard-coded single-jiffy delay.
Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4 |
|
| #
74c57875 |
| 16-Oct-2019 |
Frederic Weisbecker <[email protected]> |
context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled()
Remove the superfluous "is" in the middle of the name. We want to standardize the naming so that it can be expand
context_tracking: Rename context_tracking_is_enabled() => context_tracking_enabled()
Remove the superfluous "is" in the middle of the name. We want to standardize the naming so that it can be expanded through suffixes:
context_tracking_enabled() context_tracking_enabled_cpu() context_tracking_enabled_this_cpu()
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Jacek Anaszewski <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Yauheni Kaliuta <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2 |
|
| #
01b4c399 |
| 24-Jul-2019 |
Frederic Weisbecker <[email protected]> |
nohz: Add TICK_DEP_BIT_RCU
If a nohz_full CPU is looping in the kernel, the scheduling-clock tick might nevertheless remain disabled. In !PREEMPT kernels, this can prevent RCU's attempts to enlist
nohz: Add TICK_DEP_BIT_RCU
If a nohz_full CPU is looping in the kernel, the scheduling-clock tick might nevertheless remain disabled. In !PREEMPT kernels, this can prevent RCU's attempts to enlist the aid of that CPU's executions of cond_resched(), which can in turn result in an arbitrarily delayed grace period and thus an OOM. RCU therefore needs a way to enable a holdout nohz_full CPU's scheduler-clock interrupt.
This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can pass to tick_dep_set_cpu() and friends to force on the scheduler-clock interrupt for a specified CPU or task. In some cases, rcutorture needs to turn on the scheduler-clock tick, so this commit also exports the relevant symbols to GPL-licensed modules.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
show more ...
|
|
Revision tags: v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5, v5.1-rc4, v5.1-rc3 |
|
| #
6f9b83ac |
| 27-Mar-2019 |
Ulf Hansson <[email protected]> |
cpuidle: Export the next timer expiration for CPUs
To be able to predict the sleep duration for a CPU entering idle, it is essential to know the expiration time of the next timer. Both the teo and
cpuidle: Export the next timer expiration for CPUs
To be able to predict the sleep duration for a CPU entering idle, it is essential to know the expiration time of the next timer. Both the teo and the menu cpuidle governors already use this information for CPU idle state selection.
Moving forward, a similar prediction needs to be made for a group of idle CPUs rather than for a single one and the following changes implement a new genpd governor for that purpose.
In order to support that feature, add a new function called tick_nohz_get_next_hrtimer() that will return the next hrtimer expiration time of a given CPU to be invoked after deciding whether or not to stop the scheduler tick on that CPU.
Make the cpuidle core call tick_nohz_get_next_hrtimer() right before invoking the ->enter() callback provided by the cpuidle driver for the given state and store its return value in the per-CPU struct cpuidle_device, so as to make it available to code outside of cpuidle.
Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(), the governor's ->select() callback has already returned and indicated whether or not the tick should be stopped, so in fact the value returned by tick_nohz_get_next_hrtimer() always is the next hrtimer expiration time for the given CPU, possibly including the tick (if it hasn't been stopped).
Co-developed-by: Lina Iyer <[email protected]> Co-developed-by: Daniel Lezcano <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Signed-off-by: Ulf Hansson <[email protected]> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <[email protected]>
show more ...
|
|
Revision tags: v5.1-rc2 |
|
| #
1b72d432 |
| 21-Mar-2019 |
Thomas Gleixner <[email protected]> |
tick: Remove outgoing CPU from broadcast masks
Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the b
tick: Remove outgoing CPU from broadcast masks
Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the broadcast mask.
This happens because the outgoing CPU is not removing itself from the broadcast masks, especially not from the broadcast_force_mask. The removal happens on the control CPU after the outgoing CPU is dead. It's a long standing issue, but the warning is harmless.
Rework the hotplug mechanism so that the outgoing CPU removes itself from the broadcast masks after disabling interrupts and removing itself from the online mask.
Reported-by: Valentin Schneider <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Valentin Schneider <[email protected]> Cc: Frederic Weisbecker <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1, v4.19, v4.19-rc8, v4.19-rc7, v4.19-rc6, v4.19-rc5, v4.19-rc4, v4.19-rc3, v4.19-rc2, v4.19-rc1, v4.18, v4.18-rc8, v4.18-rc7, v4.18-rc6, v4.18-rc5, v4.18-rc4, v4.18-rc3, v4.18-rc2, v4.18-rc1, v4.17, v4.17-rc7, v4.17-rc6, v4.17-rc5, v4.17-rc4, v4.17-rc3, v4.17-rc2, v4.17-rc1 |
|
| #
296bb1e5 |
| 05-Apr-2018 |
Rafael J. Wysocki <[email protected]> |
cpuidle: menu: Refine idle state selection for running tick
If the tick isn't stopped, the target residency of the state selected by the menu governor may be greater than the actual time to the next
cpuidle: menu: Refine idle state selection for running tick
If the tick isn't stopped, the target residency of the state selected by the menu governor may be greater than the actual time to the next tick and that means lost energy.
To avoid that, make tick_nohz_get_sleep_length() return the current time to the next event (before stopping the tick) in addition to the estimated one via an extra pointer argument and make menu_select() use that value to refine the state selection when necessary.
Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
| #
554c8aa8 |
| 03-Apr-2018 |
Rafael J. Wysocki <[email protected]> |
sched: idle: Select idle state before stopping the tick
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the
sched: idle: Select idle state before stopping the tick
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick.
This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_stop_tick(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code.
Namely, tick_nohz_get_sleep_length() can be made call tick_nohz_next_event(), introduced earlier, to get the time to the next non-highres timer event. If that happens, tick_nohz_next_event() need not be called by __tick_nohz_idle_stop_tick() again.
If it turns out that the scheduler tick cannot be stopped going forward or the next timer event is too close for the tick to be stopped, tick_nohz_get_sleep_length() can simply return the time to the next event currently programmed into the corresponding clock event device.
In addition to knowing the return value of tick_nohz_next_event(), however, tick_nohz_get_sleep_length() needs to know the time to the next highres timer event, but with the scheduler tick timer excluded, which can be computed with the help of hrtimer_get_next_event().
That minimum of that number and the tick_nohz_next_event() return value is the total time to the next timer event with the assumption that the tick will be stopped. It can be returned to the idle governor which can use it for predicting idle duration (under the assumption that the tick will be stopped) and deciding whether or not it makes sense to stop the tick before putting the CPU into the selected idle state.
With the above, the sleep_length field in struct tick_sched is not necessary any more, so drop it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227 Reported-by: Doug Smythies <[email protected]> Reported-by: Thomas Ilsche <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]>
show more ...
|
|
Revision tags: v4.16, v4.16-rc7 |
|
| #
45f1ff59 |
| 22-Mar-2018 |
Rafael J. Wysocki <[email protected]> |
cpuidle: Return nohz hint from cpuidle_select()
Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the ti
cpuidle: Return nohz hint from cpuidle_select()
Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there.
Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range.
In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time.
Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code.
Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc6 |
|
| #
2aaf709a |
| 15-Mar-2018 |
Rafael J. Wysocki <[email protected]> |
sched: idle: Do not stop the tick upfront in the idle loop
Push the decision whether or not to stop the tick somewhat deeper into the idle loop.
Stopping the tick upfront leads to unpleasant outcom
sched: idle: Do not stop the tick upfront in the idle loop
Push the decision whether or not to stop the tick somewhat deeper into the idle loop.
Stopping the tick upfront leads to unpleasant outcomes in case the idle governor doesn't agree with the nohz code on the duration of the upcoming idle period. Specifically, if the tick has been stopped and the idle governor predicts short idle, the situation is bad regardless of whether or not the prediction is accurate. If it is accurate, the tick has been stopped unnecessarily which means excessive overhead. If it is not accurate, the CPU is likely to spend too much time in the (shallow, because short idle has been predicted) idle state selected by the governor [1].
As the first step towards addressing this problem, change the code to make the tick stopping decision inside of the loop in do_idle(). In particular, do not stop the tick in the cpu_idle_poll() code path. Also don't do that in tick_nohz_irq_exit() which doesn't really have enough information on whether or not to stop the tick.
Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1] Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf Suggested-by: Frederic Weisbecker <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
| #
0e776768 |
| 05-Apr-2018 |
Rafael J. Wysocki <[email protected]> |
time: tick-sched: Reorganize idle tick management code
Prepare the scheduler tick code for reworking the idle loop to avoid stopping the tick in some cases.
The idea is to split the nohz idle entry
time: tick-sched: Reorganize idle tick management code
Prepare the scheduler tick code for reworking the idle loop to avoid stopping the tick in some cases.
The idea is to split the nohz idle entry call to decouple the idle time stats accounting and preparatory work from the actual tick stop code, in order to later be able to delay the tick stop once we reach more power-knowledgeable callers.
Move away the tick_nohz_start_idle() invocation from __tick_nohz_idle_enter(), rename the latter to __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick() as a wrapper around it for calling it from the outside.
Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead of calling the entire __tick_nohz_idle_enter(), add another wrapper disabling and enabling interrupts around tick_nohz_idle_stop_tick() and make the current callers of tick_nohz_idle_enter() call it too to retain their current functionality.
Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]>
show more ...
|
|
Revision tags: v4.16-rc5, v4.16-rc4, v4.16-rc3 |
|
| #
22ab8bc0 |
| 21-Feb-2018 |
Frederic Weisbecker <[email protected]> |
nohz: Allow to check if remote CPU tick is stopped
This check is racy but provides a good heuristic to determine whether a CPU may need a remote tick or not.
Signed-off-by: Frederic Weisbecker <fre
nohz: Allow to check if remote CPU tick is stopped
This check is racy but provides a good heuristic to determine whether a CPU may need a remote tick or not.
Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
| #
a3642983 |
| 21-Feb-2018 |
Frederic Weisbecker <[email protected]> |
nohz: Convert tick_nohz_tick_stopped() to bool
It makes this function more self-explanatory about what it does and how to use it.
Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Fr
nohz: Convert tick_nohz_tick_stopped() to bool
It makes this function more self-explanatory about what it does and how to use it.
Reported-by: Thomas Gleixner <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Wanpeng Li <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|