|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2 |
|
| #
f66b0acf |
| 05-Feb-2025 |
Nam Cao <[email protected]> |
time: Switch to hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of hrt
time: Switch to hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism.
Signed-off-by: Nam Cao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/170bb691a0d59917c8268a98c80b607128fc9f7f.1738746821.git.namcao@linutronix.de
show more ...
|
|
Revision tags: v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7 |
|
| #
49a17639 |
| 06-Nov-2024 |
Sebastian Andrzej Siewior <[email protected]> |
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
The timer and hrtimer soft interrupts are raised in hard interrupt context. With threaded interrupts force enabled or on PREEMPT_RT t
softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.
The timer and hrtimer soft interrupts are raised in hard interrupt context. With threaded interrupts force enabled or on PREEMPT_RT this leads to waking the ksoftirqd for the processing of the soft interrupt.
ksoftirqd runs as SCHED_OTHER task which means it will compete with other tasks for CPU resources. This can introduce long delays for timer processing on heavy loaded systems and is not desired.
Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated timers thread and let it run at the lowest SCHED_FIFO priority. Wake-ups for RT tasks happen from hardirq context so only timer_list timers and hrtimers for "regular" tasks are processed here. The higher priority ensures that wakeups are performed before scheduling SCHED_OTHER tasks.
Using a dedicated variable to store the pending softirq bits values ensure that the timer are not accidentally picked up by ksoftirqd and other threaded interrupts.
It shouldn't be picked up by ksoftirqd since it runs at lower priority. However if ksoftirqd is already running while a timer fires, then ksoftird will be PI-boosted due to the BH-lock to ktimer's priority.
The timer thread can pick up pending softirqs from ksoftirqd but only if the softirq load is high. It is not be desired that the picked up softirqs are processed at SCHED_FIFO priority under high softirq load but this can already happen by a PI-boost by a force-threaded interrupt.
[ [email protected]: rcutorture.c fixes, storm fix by introduction of local_timers_pending() for tick_nohz_next_event() ]
[ [email protected]: Ensure ktimersd gets woken up even if a softirq is currently served. ]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> [rcutorture] Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc6 |
|
| #
a6347864 |
| 29-Oct-2024 |
Frederic Weisbecker <[email protected]> |
tick: Remove now unneeded low-res tick stop on CPUHP_AP_TICK_DYING
The generic clockevent layer now detaches and stops the underlying clockevent from the dying CPU, unifying the tick behaviour for b
tick: Remove now unneeded low-res tick stop on CPUHP_AP_TICK_DYING
The generic clockevent layer now detaches and stops the underlying clockevent from the dying CPU, unifying the tick behaviour for both periodic and oneshot mode on offline CPUs. There is no more need for the tick layer to care about that.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.12-rc5, v6.12-rc4, v6.12-rc3 |
|
| #
cd9626e9 |
| 10-Oct-2024 |
Peter Zijlstra <[email protected]> |
sched/fair: Fix external p->on_rq users
Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement delayed dequeue") KVM's preemption notifiers have started mis-classifying preemption vs
sched/fair: Fix external p->on_rq users
Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement delayed dequeue") KVM's preemption notifiers have started mis-classifying preemption vs blocking.
Notably p->on_rq is no longer sufficient to determine if a task is runnable or blocked -- the aforementioned commit introduces tasks that remain on the runqueue even through they will not run again, and should be considered blocked for many cases.
Add the task_is_runnable() helper to classify things and audit all external users of the p->on_rq state. Also add a few comments.
Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Reported-by: Sean Christopherson <[email protected]> Tested-by: Sean Christopherson <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4 |
|
| #
59dbee7d |
| 10-Jun-2024 |
Anna-Maria Behnsen <[email protected]> |
tick/sched: Combine WARN_ON_ONCE and print_once
When the WARN_ON_ONCE() triggers, the printk() of the additional information related to the warning will not happen in print level "warn". When readin
tick/sched: Combine WARN_ON_ONCE and print_once
When the WARN_ON_ONCE() triggers, the printk() of the additional information related to the warning will not happen in print level "warn". When reading dmesg with a restriction to level "warn", the information published by the printk_once() will not show up there.
Transform WARN_ON_ONCE() and printk_once() into a WARN_ONCE().
Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
9403408e |
| 17-Jun-2024 |
Christian Loehle <[email protected]> |
tick: Remove unnused tick_nohz_get_idle_calls()
The function returns the idle calls counter for the current cpu and therefore usually isn't what the caller wants. It is unnused since commit 466a2b42
tick: Remove unnused tick_nohz_get_idle_calls()
The function returns the idle calls counter for the current cpu and therefore usually isn't what the caller wants. It is unnused since commit 466a2b42d676 ("cpufreq: schedutil: Use idle_calls counter of the remote CPU")
Signed-off-by: Christian Loehle <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
f87cbcb3 |
| 09-Apr-2024 |
Thomas Gleixner <[email protected]> |
timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
tick_do_timer_cpu is used lockless to check which CPU needs to take care of the per tick timekeeping duty. This is done to avoid a thundering
timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
tick_do_timer_cpu is used lockless to check which CPU needs to take care of the per tick timekeeping duty. This is done to avoid a thundering herd problem on jiffies_lock.
The read and writes are not annotated so KCSAN complains about data races:
BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26: tick_nohz_idle_stop_tick+0x3b1/0x4a0 do_idle+0x1e3/0x250
read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16: tick_nohz_next_event+0xe7/0x1e0 tick_nohz_get_sleep_length+0xa7/0xe0 menu_select+0x82/0xb90 cpuidle_select+0x44/0x60 do_idle+0x1c2/0x250
value changed: 0x0000001a -> 0xffffffff
Annotate them with READ/WRITE_ONCE() to document the intentional data race.
Reported-by: Mirsad Todorovac <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Sean Anderson <[email protected]> Link: https://lore.kernel.org/r/87cyqy7rt3.ffs@tglx
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2 |
|
| #
f29536bf |
| 31-Mar-2024 |
Randy Dunlap <[email protected]> |
tick/sched: Fix various kernel-doc warnings
Fix a slew of kernel-doc warnings in tick-sched.c:
tick-sched.c:650: warning: Function parameter or struct member 'now' not described in 'tick_nohz_upd
tick/sched: Fix various kernel-doc warnings
Fix a slew of kernel-doc warnings in tick-sched.c:
tick-sched.c:650: warning: Function parameter or struct member 'now' not described in 'tick_nohz_update_jiffies' tick-sched.c:741: warning: No description found for return value of 'get_cpu_idle_time_us' tick-sched.c:767: warning: No description found for return value of 'get_cpu_iowait_time_us' tick-sched.c:1210: warning: No description found for return value of 'tick_nohz_idle_got_tick' tick-sched.c:1228: warning: No description found for return value of 'tick_nohz_get_next_hrtimer' tick-sched.c:1243: warning: No description found for return value of 'tick_nohz_get_sleep_length' tick-sched.c:1282: warning: Function parameter or struct member 'cpu' not described in 'tick_nohz_get_idle_calls_cpu' tick-sched.c:1282: warning: No description found for return value of 'tick_nohz_get_idle_calls_cpu' tick-sched.c:1294: warning: No description found for return value of 'tick_nohz_get_idle_calls' tick-sched.c:1577: warning: Function parameter or struct member 'hrtimer' not described in 'tick_setup_sched_timer' tick-sched.c:1577: warning: Excess function parameter 'mode' description in 'tick_setup_sched_timer'
Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
500f8f9b |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after.
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle.
Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state.
Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3f69d04e |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Shut down low-res tick from dying CPU
The timekeeping duty is handed over from the outgoing CPU within stop machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in high-res mode. H
tick: Shut down low-res tick from dying CPU
The timekeeping duty is handed over from the outgoing CPU within stop machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in high-res mode. However in low-res dynticks mode, the tick isn't cancelled until the clockevent is shut down, which can happen later. The tick may therefore fire again once IRQs are re-enabled on stop machine and until IRQs are disabled for good upon the last call to idle.
That's so many opportunities for a timekeeper to go idle and the outgoing CPU to take over that duty. This is why tick_nohz_idle_stop_tick() is called one last time on idle if the CPU is seen offline: so that the timekeeping duty is handed over again in case the CPU has re-taken the duty.
This means there are two timekeeping handovers on CPU down hotplug with different undocumented constraints and purposes:
1) A handover on stop machine for !dynticks || highres. All online CPUs are guaranteed to be non-idle and the timekeeping duty can be safely handed-over. The hrtimer tick is cancelled so it is guaranteed that in dynticks mode the outgoing CPU won't take again the duty.
2) A handover on last idle call for dynticks && lowres. Setting the duty to TICK_DO_TIMER_NONE makes sure that a CPU will take over the timekeeping.
Prepare for consolidating the handover to a single place (the first one) with shutting down the low-res tick as well from tick_cancel_sched_timer() as well. This will simplify the handover and unify the tick cancellation between high-res and low-res.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
7988e5ae |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Split nohz and highres features from nohz_mode
The nohz mode field tells about low resolution nohz mode or high resolution nohz mode but it doesn't tell about high resolution non-nohz mode.
I
tick: Split nohz and highres features from nohz_mode
The nohz mode field tells about low resolution nohz mode or high resolution nohz mode but it doesn't tell about high resolution non-nohz mode.
In order to retrieve the latter state, tick_cancel_sched_timer() must fiddle with struct hrtimer's internals to guess if the tick has been initialized in high resolution.
Move instead the nohz mode field information into the tick flags and provide two new bits: one to know if the tick is in nohz mode and another one to know if the tick is in high resolution. The combination of those two flags provides all the needed informations to determine which of the three tick modes is running.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
a478ffb2 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Move individual bit features to debuggable mask accesses
The individual bitfields of struct tick_sched must be modified from IRQs disabled places, otherwise local modifications can race due to
tick: Move individual bit features to debuggable mask accesses
The individual bitfields of struct tick_sched must be modified from IRQs disabled places, otherwise local modifications can race due to them sharing the same memory storage.
The recent move of the "got_idle_tick" bitfield to its own storage shows that the use of these bitfields, as pretty as they look, can be as much error prone.
In order to avoid future issues of the like and make sure that those bitfields are safely accessed, move those flags to an explicit mask along with a mutator function performing the basic IRQs disabled sanity check.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d9b1865c |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
The full-nohz update function checks if the nohz mode is active before proceeding. It considers one exception though: if the tick is
tick: Assume the tick can't be stopped in NOHZ_MODE_INACTIVE mode
The full-nohz update function checks if the nohz mode is active before proceeding. It considers one exception though: if the tick is already stopped even though the nohz mode is inactive, it still moves on in order to update/restart the tick if needed.
However in order for the tick to be stopped, the nohz_mode has to be either NOHZ_MODE_LOWRES or NOHZ_MODE_HIGHRES. Therefore it doesn't make sense to test if the tick is stopped before verifying NOHZ_MODE_INACTIVE mode.
Remove the needless related condition.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
60313c21 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
The tick sched structure is already cleared from tick_cancel_sched_timer(), so there is no need to clear that field again.
Signed
tick/sched: Don't clear ts::next_tick again in can_stop_idle_tick()
The tick sched structure is already cleared from tick_cancel_sched_timer(), so there is no need to clear that field again.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3650f49b |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
tick_nohz_stop_sched_tick() is only about NOHZ_full and not about dynticks-idle. Reflect that in the function name to avo
tick/sched: Rename tick_nohz_stop_sched_tick() to tick_nohz_full_stop_tick()
tick_nohz_stop_sched_tick() is only about NOHZ_full and not about dynticks-idle. Reflect that in the function name to avoid confusion.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
27dc0809 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Use IS_ENABLED() whenever possible
Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Glei
tick: Use IS_ENABLED() whenever possible
Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3aedb7fc |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick/sched: Remove useless oneshot ifdeffery
tick-sched.c is only built when CONFIG_TICK_ONESHOT=y, which is selected only if CONFIG_NO_HZ_COMMON=y or CONFIG_HIGH_RES_TIMERS=y. Therefore the related
tick/sched: Remove useless oneshot ifdeffery
tick-sched.c is only built when CONFIG_TICK_ONESHOT=y, which is selected only if CONFIG_NO_HZ_COMMON=y or CONFIG_HIGH_RES_TIMERS=y. Therefore the related ifdeferry in this file is needless and can be removed.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
37263ba0 |
| 25-Feb-2024 |
Peng Liu <[email protected]> |
tick/nohz: Remove duplicate between lowres and highres handlers
tick_nohz_lowres_handler() does the same work as tick_nohz_highres_handler() plus the clockevent device reprogramming, so make the for
tick/nohz: Remove duplicate between lowres and highres handlers
tick_nohz_lowres_handler() does the same work as tick_nohz_highres_handler() plus the clockevent device reprogramming, so make the former reuse the latter and rename it accordingly.
Signed-off-by: Peng Liu <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ffb7e01c |
| 25-Feb-2024 |
Peng Liu <[email protected]> |
tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
The ts->sched_timer initialization work of tick_nohz_switch_to_nohz() is almost the same as that of tick_s
tick/nohz: Remove duplicate between tick_nohz_switch_to_nohz() and tick_setup_sched_timer()
The ts->sched_timer initialization work of tick_nohz_switch_to_nohz() is almost the same as that of tick_setup_sched_timer(), so adjust the latter to get it reused by tick_nohz_switch_to_nohz().
This also makes the low resolution mode sched_timer benefit from the tick skew boot option.
Signed-off-by: Peng Liu <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
4c532939 |
| 21-Feb-2024 |
Richard Cochran (linutronix GmbH) <[email protected]> |
tick/sched: Split out jiffies update helper function
The logic to get the time of the last jiffies update will be needed by the timer pull model as well.
Move the code into a global function in ant
tick/sched: Split out jiffies update helper function
The logic to get the time of the last jiffies update will be needed by the timer pull model as well.
Move the code into a global function in anticipation of the new caller.
No functional change.
Signed-off-by: Richard Cochran (linutronix GmbH) <[email protected]> Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
73129cf4 |
| 21-Feb-2024 |
Anna-Maria Behnsen <[email protected]> |
timers: Optimization for timer_base_try_to_set_idle()
When tick is stopped also the timer base is_idle flag is set. When reentering timer_base_try_to_set_idle() with the tick stopped, there is no ne
timers: Optimization for timer_base_try_to_set_idle()
When tick is stopped also the timer base is_idle flag is set. When reentering timer_base_try_to_set_idle() with the tick stopped, there is no need to check whether the timer base needs to be set idle again. When a timer was enqueued in the meantime, this is already handled by the tick_nohz_next_event() call which was executed before tick_nohz_stop_tick().
Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
e2e1d724 |
| 21-Feb-2024 |
Anna-Maria Behnsen <[email protected]> |
timers: Move marking timer bases idle into tick_nohz_stop_tick()
The timer base is marked idle when get_next_timer_interrupt() is executed. But the decision whether the tick will be stopped and whet
timers: Move marking timer bases idle into tick_nohz_stop_tick()
The timer base is marked idle when get_next_timer_interrupt() is executed. But the decision whether the tick will be stopped and whether the system is able to go idle is done later. When the timer bases is marked idle and a new first timer is enqueued remote an IPI is raised. Even if it is not required because the tick is not stopped and the timer base is evaluated again at the next tick.
To prevent this, the timer base is marked idle in tick_nohz_stop_tick() and get_next_timer_interrupt() is streamlined by only looking for the next timer interrupt. All other work is postponed to timer_base_try_to_set_idle() which is called by tick_nohz_stop_tick(). timer_base_try_to_set_idle() never resets timer_base::is_idle state. This is done when the tick is restarted via tick_nohz_restart_sched_tick().
With this, tick_sched::tick_stopped and timer_base::is_idle are always in sync. So there is no longer the need to execute timer_clear_idle() in tick_nohz_idle_retain_tick(). This was required before, as tick_nohz_next_event() set timer_base::is_idle even if the tick would not be stopped. So timer_clear_idle() is only executed, when timer base is idle. So the check whether timer base is idle, is now no longer required as well.
While at it fix some nearby whitespace damage as well.
Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2 |
|
| #
f365d055 |
| 23-Jan-2024 |
Anna-Maria Behnsen <[email protected]> |
tick/sched: Add function description for tick_nohz_next_event()
The return value of tick_nohz_next_event() is not obvious at the first glance. Add a kernel-doc compatible function description which
tick/sched: Add function description for tick_nohz_next_event()
The return value of tick_nohz_next_event() is not obvious at the first glance. Add a kernel-doc compatible function description which also covers return values.
Signed-off-by: Anna-Maria Behnsen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
9a574ea9 |
| 22-Jan-2024 |
Tim Chen <[email protected]> |
tick/sched: Preserve number of idle sleeps across CPU hotplug events
Commit 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") preserved total idle sleep time and iowai
tick/sched: Preserve number of idle sleeps across CPU hotplug events
Commit 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") preserved total idle sleep time and iowait sleeptime across CPU hotplug events.
Similar reasoning applies to the number of idle calls and idle sleeps to get the proper average of sleep time per idle invocation.
Preserve those fields too.
Fixes: 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") Signed-off-by: Tim Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc1 |
|
| #
71fee48f |
| 15-Jan-2024 |
Heiko Carstens <[email protected]> |
tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
When offlining and onlining CPUs the overall reported idle and iowait times as reported by /proc/stat jump backward and forward:
tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug
When offlining and onlining CPUs the overall reported idle and iowait times as reported by /proc/stat jump backward and forward:
cpu 132 0 176 225249 47 6 6 21 0 0 cpu0 80 0 115 112575 33 3 4 18 0 0 cpu1 52 0 60 112673 13 3 1 2 0 0
cpu 133 0 177 226681 47 6 6 21 0 0 cpu0 80 0 116 113387 33 3 4 18 0 0
cpu 133 0 178 114431 33 6 6 21 0 0 <---- jump backward cpu0 80 0 116 114247 33 3 4 18 0 0 cpu1 52 0 61 183 0 3 1 2 0 0 <---- idle + iowait start with 0
cpu 133 0 178 228956 47 6 6 21 0 0 <---- jump forward cpu0 81 0 117 114929 33 3 4 18 0 0
Reason for this is that get_idle_time() in fs/proc/stat.c has different sources for both values depending on if a CPU is online or offline:
- if a CPU is online the values may be taken from its per cpu tick_cpu_sched structure
- if a CPU is offline the values are taken from its per cpu cpustat structure
The problem is that the per cpu tick_cpu_sched structure is set to zero on CPU offline. See tick_cancel_sched_timer() in kernel/time/tick-sched.c.
Therefore when a CPU is brought offline and online afterwards both its idle and iowait sleeptime will be zero, causing a jump backward in total system idle and iowait sleeptime. In a similar way if a CPU is then brought offline again the total idle and iowait sleeptimes will jump forward.
It looks like this behavior was introduced with commit 4b0c0f294f60 ("tick: Cleanup NOHZ per cpu data on cpu down").
This was only noticed now on s390, since we switched to generic idle time reporting with commit be76ea614460 ("s390/idle: remove arch_cpu_idle_time() and corresponding code").
Fix this by preserving the values of idle_sleeptime and iowait_sleeptime members of the per-cpu tick_sched structure on CPU hotplug.
Fixes: 4b0c0f294f60 ("tick: Cleanup NOHZ per cpu data on cpu down") Reported-by: Gerald Schaefer <[email protected]> Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|