|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
| #
92e250c6 |
| 04-Apr-2025 |
Sebastian Andrzej Siewior <[email protected]> |
timekeeping: Add a lockdep override in tick_freeze()
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver a
timekeeping: Add a lockdep override in tick_freeze()
tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver acquires a spinlock which becomes a sleeping lock on PREEMPT_RT. Lockdep complains about this lock nesting.
Add a lockdep override for this special case and a comment explaining why it is okay.
Reported-by: Borislav Petkov <[email protected]> Reported-by: Chris Bainbridge <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/all/[email protected] Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/ Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
show more ...
|
|
Revision tags: v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2 |
|
| #
07c54cc5 |
| 28-May-2024 |
Oleg Nesterov <[email protected]> |
tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash when the boot CPU is nohz_full") the kernel n
tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash when the boot CPU is nohz_full") the kernel no longer crashes, but there is another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled) in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/all/[email protected]
show more ...
|
|
Revision tags: v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4 |
|
| #
f87cbcb3 |
| 09-Apr-2024 |
Thomas Gleixner <[email protected]> |
timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
tick_do_timer_cpu is used lockless to check which CPU needs to take care of the per tick timekeeping duty. This is done to avoid a thundering
timekeeping: Use READ/WRITE_ONCE() for tick_do_timer_cpu
tick_do_timer_cpu is used lockless to check which CPU needs to take care of the per tick timekeeping duty. This is done to avoid a thundering herd problem on jiffies_lock.
The read and writes are not annotated so KCSAN complains about data races:
BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
write to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 26: tick_nohz_idle_stop_tick+0x3b1/0x4a0 do_idle+0x1e3/0x250
read to 0xffffffff8a2bda30 of 4 bytes by task 0 on cpu 16: tick_nohz_next_event+0xe7/0x1e0 tick_nohz_get_sleep_length+0xa7/0xe0 menu_select+0x82/0xb90 cpuidle_select+0x44/0x60 do_idle+0x1c2/0x250
value changed: 0x0000001a -> 0xffffffff
Annotate them with READ/WRITE_ONCE() to document the intentional data race.
Reported-by: Mirsad Todorovac <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Sean Anderson <[email protected]> Link: https://lore.kernel.org/r/87cyqy7rt3.ffs@tglx
show more ...
|
|
Revision tags: v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
500f8f9b |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after.
tick: Assume timekeeping is correctly handed over upon last offline idle call
The timekeeping duty is handed over from the outgoing CPU on stop machine, then the oneshot tick is stopped right after. Therefore it's guaranteed that the current CPU isn't the timekeeper upon its last call to idle.
Besides, calling tick_nohz_idle_stop_tick() while the dying CPU goes into idle suggests that the tick is going to be stopped while it is actually stopped already from the appropriate CPU hotplug state.
Remove the confusing call and the obsolete case handling and convert it to a sanity check that verifies the above assumption.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3f69d04e |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Shut down low-res tick from dying CPU
The timekeeping duty is handed over from the outgoing CPU within stop machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in high-res mode. H
tick: Shut down low-res tick from dying CPU
The timekeeping duty is handed over from the outgoing CPU within stop machine. This works well if CONFIG_NO_HZ_COMMON=n or the tick is in high-res mode. However in low-res dynticks mode, the tick isn't cancelled until the clockevent is shut down, which can happen later. The tick may therefore fire again once IRQs are re-enabled on stop machine and until IRQs are disabled for good upon the last call to idle.
That's so many opportunities for a timekeeper to go idle and the outgoing CPU to take over that duty. This is why tick_nohz_idle_stop_tick() is called one last time on idle if the CPU is seen offline: so that the timekeeping duty is handed over again in case the CPU has re-taken the duty.
This means there are two timekeeping handovers on CPU down hotplug with different undocumented constraints and purposes:
1) A handover on stop machine for !dynticks || highres. All online CPUs are guaranteed to be non-idle and the timekeeping duty can be safely handed-over. The hrtimer tick is cancelled so it is guaranteed that in dynticks mode the outgoing CPU won't take again the duty.
2) A handover on last idle call for dynticks && lowres. Setting the duty to TICK_DO_TIMER_NONE makes sure that a CPU will take over the timekeeping.
Prepare for consolidating the handover to a single place (the first one) with shutting down the low-res tick as well from tick_cancel_sched_timer() as well. This will simplify the handover and unify the tick cancellation between high-res and low-res.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
ef8969bb |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU.
However the tick broadc
tick: Move broadcast cancellation up to CPUHP_AP_TICK_DYING
The broadcast shutdown code is executed through a random explicit call within stop machine from the outgoing CPU.
However the tick broadcast is a midware between the tick callback and the clocksource, therefore it makes more sense to shut it down after the tick callback and before the clocksource drivers.
Move it instead to the common tick shutdown CPU hotplug state where related operations can be ordered from highest to lowest level.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
f04e5122 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
The tick hrtimer is cancelled right before hrtimers are migrated. This is done from the hrtimer subsystem even though it shouldn't know about i
tick: Move tick cancellation up to CPUHP_AP_TICK_DYING
The tick hrtimer is cancelled right before hrtimers are migrated. This is done from the hrtimer subsystem even though it shouldn't know about its actual users.
Move instead the tick hrtimer cancellation to the relevant CPU hotplug state that aims at centralizing high level tick shutdown operations so that the related flow is easy to follow.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
3ad6eb06 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks
tick: Start centralizing tick related CPU hotplug operations
During the CPU offlining process, the various timer tick features are shut down from scattered places, sometimes from teardown callbacks on stop machine, sometimes through explicit calls, sometimes from the control CPU after the CPU died. The reason why these shutdown operations are spread around is not always clear and it makes the tick lifecycle hard to follow.
The tick should be shut down in order from highest to lowest level:
On stop machine from the dying CPU (high-level):
1) Hand-over the timekeeping duty (tick_handover_do_timer()) 2) Cancel the tick implementation called by the clockevent callback (tick_cancel_sched_timer()) 3) Shutdown broadcasting (tick_offline_cpu() / tick_broadcast_offline())
On stop machine from the dying CPU (low-level):
4) Shutdown clockevents drivers (CPUHP_AP_*_TIMER_STARTING states)
From the control CPU after the CPU died (low-level):
5) Shutdown/unregister/cleanup clockevents for the dead CPU (tick_cleanup_dead_cpu())
Instead the current order is 2, 4 (both from CPU hotplug states), then 1 and 3 through direct calls. This layout and order don't make much sense. The operations 1, 2, 3 should be gathered together and in order.
Sort this situation with creating a new TICK shut-down CPU hotplug state and start with introducing the timekeeping duty hand-over there. The state must precede hrtimers migration because the tick hrtimer will be stopped from it in a further patch.
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
27dc0809 |
| 25-Feb-2024 |
Frederic Weisbecker <[email protected]> |
tick: Use IS_ENABLED() whenever possible
Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Glei
tick: Use IS_ENABLED() whenever possible
Avoid ifdeferry if it can be converted to IS_ENABLED() whenever possible
Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7 |
|
| #
13bb06f8 |
| 15-Jun-2023 |
Thomas Gleixner <[email protected]> |
tick/common: Align tick period during sched_timer setup
The tick period is aligned very early while the first clock_event_device is registered. At that point the system runs in periodic mode and swi
tick/common: Align tick period during sched_timer setup
The tick period is aligned very early while the first clock_event_device is registered. At that point the system runs in periodic mode and switches later to one-shot mode if possible.
The next wake-up event is programmed based on the aligned value (tick_next_period) but the delta value, that is used to program the clock_event_device, is computed based on ktime_get().
With the subtracted offset, the device fires earlier than the exact time frame. With a large enough offset the system programs the timer for the next wake-up and the remaining time left is too small to make any boot progress. The system hangs.
Move the alignment later to the setup of tick_sched timer. At this point the system switches to oneshot mode and a high resolution clocksource is available. At this point it is safe to align tick_next_period because ktime_get() will now return accurate (not jiffies based) time.
[bigeasy: Patch description + testing].
Fixes: e9523a0d81899 ("tick/common: Align tick period with the HZ tick.") Reported-by: Mathias Krause <[email protected]> Reported-by: "Bhatnagar, Rishabh" <[email protected]> Suggested-by: Mathias Krause <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Richard W.M. Jones <[email protected]> Tested-by: Mathias Krause <[email protected]> Acked-by: SeongJae Park <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/[email protected] Link: https://lore.kernel.org/[email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3 |
|
| #
e9523a0d |
| 18-Apr-2023 |
Sebastian Andrzej Siewior <[email protected]> |
tick/common: Align tick period with the HZ tick.
With HIGHRES enabled tick_sched_timer() is programmed every jiffy to expire the timer_list timers. This timer is programmed accurate in respect to CL
tick/common: Align tick period with the HZ tick.
With HIGHRES enabled tick_sched_timer() is programmed every jiffy to expire the timer_list timers. This timer is programmed accurate in respect to CLOCK_MONOTONIC so that 0 seconds and nanoseconds is the first tick and the next one is 1000/CONFIG_HZ ms later. For HZ=250 it is every 4 ms and so based on the current time the next tick can be computed.
This accuracy broke since the commit mentioned below because the jiffy based clocksource is initialized with higher accuracy in read_persistent_wall_and_boot_offset(). This higher accuracy is inherited during the setup in tick_setup_device(). The timer still fires every 4ms with HZ=250 but timer is no longer aligned with CLOCK_MONOTONIC with 0 as it origin but has an offset in the us/ns part of the timestamp. The offset differs with every boot and makes it impossible for user land to align with the tick.
Align the tick period with CLOCK_MONOTONIC ensuring that it is always a multiple of 1000/CONFIG_HZ ms.
Fixes: 857baa87b6422 ("sched/clock: Enable sched clock early") Reported-by: Gusenleitner Klaus <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/[email protected] Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1, v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2 |
|
| #
a761a67f |
| 13-Jul-2021 |
Thomas Gleixner <[email protected]> |
timekeeping: Distangle resume and clock-was-set events
Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set
timekeeping: Distangle resume and clock-was-set events
Resuming timekeeping is a clock-was-set event and uses the clock-was-set notification mechanism. This is in the way of making the clock-was-set update for hrtimers selective so unnecessary IPIs are avoided when a CPU base does not have timers queued which are affected by the clock setting.
Distangle it by invoking hrtimer_resume() on each unfreezing CPU and invoke the new timerfd_resume() function from timekeeping_resume() which is the only place where this is needed.
Rename hrtimer_resume() to hrtimer_resume_local() to reflect the change.
With this the clock_was_set*() functions are not longer required to IPI all CPUs unconditionally and can get some smarts to avoid them.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4 |
|
| #
c94a8537 |
| 24-May-2021 |
Will Deacon <[email protected]> |
tick/broadcast: Prefer per-cpu oneshot wakeup timers to broadcast
Some SoCs have two per-cpu timer implementations where the timer with the higher rating stops in deep idle (i.e. suffers from CLOCK_
tick/broadcast: Prefer per-cpu oneshot wakeup timers to broadcast
Some SoCs have two per-cpu timer implementations where the timer with the higher rating stops in deep idle (i.e. suffers from CLOCK_EVT_FEAT_C3STOP) but is otherwise preferable to the timer with the lower rating. In such a design, selecting the higher rated devices relies on a global broadcast timer and IPIs to wake up from deep idle states.
To avoid the reliance on a global broadcast timer and also to reduce the overhead associated with the IPI wakeups, extend tick_install_broadcast_device() to manage per-cpu wakeup timers separately from the broadcast device.
For now, these timers remain unused.
Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5 |
|
| #
d7840aaa |
| 26-Mar-2021 |
Wang Wensheng <[email protected]> |
tick: Use tick_check_replacement() instead of open coding it
The function tick_check_replacement() is the combination of tick_check_percpu() and tick_check_preferred(), but tick_check_new_device() h
tick: Use tick_check_replacement() instead of open coding it
The function tick_check_replacement() is the combination of tick_check_percpu() and tick_check_preferred(), but tick_check_new_device() has the same logic open coded.
Use the helper to simplify the code.
[ tglx: Massage changelog ]
Signed-off-by: Wang Wensheng <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7 |
|
| #
f12ad423 |
| 06-Dec-2020 |
Thomas Gleixner <[email protected]> |
tick: Remove pointless cpu valid check in hotplug code
tick_handover_do_timer() which is invoked when a CPU is unplugged has a check for cpumask_first(cpu_online_mask) when it tries to hand over the
tick: Remove pointless cpu valid check in hotplug code
tick_handover_do_timer() which is invoked when a CPU is unplugged has a check for cpumask_first(cpu_online_mask) when it tries to hand over the tick update duty.
Checking the result of cpumask_first() there is pointless because if the online mask is empty at this point, then this would be the last CPU in the system going offline, which is impossible. There is always at least one CPU remaining. If online mask would be really empty then the timer duty would be the least of the resulting problems.
Remove the well meant check simply because it is pointless and confusing.
Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.10-rc6, v5.10-rc5 |
|
| #
b9965449 |
| 17-Nov-2020 |
Thomas Gleixner <[email protected]> |
tick: Get rid of tick_period
The variable tick_period is initialized to NSEC_PER_TICK / HZ during boot and never updated again.
If NSEC_PER_TICK is not an integer multiple of HZ this computation is
tick: Get rid of tick_period
The variable tick_period is initialized to NSEC_PER_TICK / HZ during boot and never updated again.
If NSEC_PER_TICK is not an integer multiple of HZ this computation is less accurate than TICK_NSEC which has proper rounding in place.
Aside of the inaccuracy there is no reason for having this variable at all. It's just a pointless indirection and all usage sites can just use the TICK_NSEC constant.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
c398960c |
| 17-Nov-2020 |
Thomas Gleixner <[email protected]> |
tick: Document protections for tick related data
The protection rules for tick_next_period and last_jiffies_update are blury at best. Clarify this.
Signed-off-by: Thomas Gleixner <[email protected]
tick: Document protections for tick related data
The protection rules for tick_next_period and last_jiffies_update are blury at best. Clarify this.
Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3, v5.7-rc2, v5.7-rc1, v5.6, v5.6-rc7 |
|
| #
e5d4d175 |
| 21-Mar-2020 |
Thomas Gleixner <[email protected]> |
timekeeping: Split jiffies seqlock
seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT en
timekeeping: Split jiffies seqlock
seqlock consists of a sequence counter and a spinlock_t which is used to serialize the writers. spinlock_t is substituted by a "sleeping" spinlock on PREEMPT_RT enabled kernels which breaks the usage in the timekeeping code as the writers are executed in hard interrupt and therefore non-preemptible context even on PREEMPT_RT.
The spinlock in seqlock cannot be unconditionally replaced by a raw_spinlock_t as many seqlock users have nesting spinlock sections or other code which is not suitable to run in truly atomic context on RT.
Instead of providing a raw_seqlock API for a single use case, open code the seqlock for the jiffies use case and implement it with a raw_spinlock_t and a sequence counter.
Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
| #
52da479a |
| 19-Mar-2020 |
Thomas Gleixner <[email protected]> |
Revert "tick/common: Make tick_periodic() check for missing ticks"
This reverts commit d441dceb5dce71150f28add80d36d91bbfccba99 due to boot failures.
Reported-by: Qian Cai <[email protected]> Signed-off-b
Revert "tick/common: Make tick_periodic() check for missing ticks"
This reverts commit d441dceb5dce71150f28add80d36d91bbfccba99 due to boot failures.
Reported-by: Qian Cai <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Waiman Long <[email protected]>
show more ...
|
|
Revision tags: v5.6-rc6, v5.6-rc5, v5.6-rc4, v5.6-rc3, v5.6-rc2, v5.6-rc1 |
|
| #
d441dceb |
| 07-Feb-2020 |
Waiman Long <[email protected]> |
tick/common: Make tick_periodic() check for missing ticks
The tick_periodic() function is used at the beginning part of the bootup process for time keeping while the other clock sources are being in
tick/common: Make tick_periodic() check for missing ticks
The tick_periodic() function is used at the beginning part of the bootup process for time keeping while the other clock sources are being initialized.
The current code assumes that all the timer interrupts are handled in a timely manner with no missing ticks. That is not actually true. Some ticks are missed and there are some discrepancies between the tick time (jiffies) and the timestamp reported in the kernel log. Some systems, however, are more prone to missing ticks than the others. In the extreme case, the discrepancy can actually cause a soft lockup message to be printed by the watchdog kthread. For example, on a Cavium ThunderX2 Sabre arm64 system:
[ 25.496379] watchdog: BUG: soft lockup - CPU#14 stuck for 22s!
On that system, the missing ticks are especially prevalent during the smp_init() phase of the boot process. With an instrumented kernel, it was found that it took about 24s as reported by the timestamp for the tick to accumulate 4s of time.
Investigation and bisection done by others seemed to point to the commit 73f381660959 ("arm64: Advertise mitigation of Spectre-v2, or lack thereof") as the culprit. It could also be a firmware issue as new firmware was promised that would fix the issue.
To properly address this problem, stop assuming that there will be no missing tick in tick_periodic(). Modify it to follow the example of tick_do_update_jiffies64() by using another reference clock to check for missing ticks. Since the watchdog timer uses running_clock(), it is used here as the reference. With this applied, the soft lockup problem in the affected arm64 system is gone and tick time tracks much more closely to the timestamp time.
Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5, v5.5-rc7, v5.5-rc6 |
|
| #
5167c506 |
| 10-Jan-2020 |
Chunyan Zhang <[email protected]> |
tick/common: Touch watchdog in tick_unfreeze() on all CPUs
Suspend to IDLE invokes tick_unfreeze() on resume. tick_unfreeze() on the first resuming CPU resumes timekeeping, which also has the side e
tick/common: Touch watchdog in tick_unfreeze() on all CPUs
Suspend to IDLE invokes tick_unfreeze() on resume. tick_unfreeze() on the first resuming CPU resumes timekeeping, which also has the side effect of resetting the softlockup watchdog on this CPU.
But on the secondary CPUs the watchdog is not reset in the resume / unfreeze() path, which can result in false softlockup warnings on those CPUs depending on the time spent in suspend.
Prevent this by clearing the softlock watchdog in the unfreeze path also on the secondary resuming CPUs.
[ tglx: Massaged changelog ]
Signed-off-by: Chunyan Zhang <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.5-rc5, v5.5-rc4, v5.5-rc3, v5.5-rc2, v5.5-rc1, v5.4, v5.4-rc8, v5.4-rc7, v5.4-rc6, v5.4-rc5, v5.4-rc4, v5.4-rc3, v5.4-rc2, v5.4-rc1, v5.3, v5.3-rc8, v5.3-rc7, v5.3-rc6, v5.3-rc5, v5.3-rc4, v5.3-rc3, v5.3-rc2, v5.3-rc1, v5.2, v5.2-rc7, v5.2-rc6, v5.2-rc5, v5.2-rc4, v5.2-rc3, v5.2-rc2, v5.2-rc1, v5.1, v5.1-rc7, v5.1-rc6, v5.1-rc5 |
|
| #
08ae95f4 |
| 11-Apr-2019 |
Nicholas Piggin <[email protected]> |
nohz_full: Allow the boot CPU to be nohz_full
Allow the boot CPU/CPU0 to be nohz_full. Have the boot CPU take the do_timer duty during boot until a housekeeping CPU can take over.
This is supported
nohz_full: Allow the boot CPU to be nohz_full
Allow the boot CPU/CPU0 to be nohz_full. Have the boot CPU take the do_timer duty during boot until a housekeeping CPU can take over.
This is supported when CONFIG_PM_SLEEP_SMP is not configured, or when it is configured and the arch allows suspend on non-zero CPUs.
nohz_full has been trialed at a large supercomputer site and found to significantly reduce jitter. In order to deploy it in production, they need CPU0 to be nohz_full because their job control system requires the application CPUs to start from 0, and the housekeeping CPUs are placed higher. An equivalent job scheduling that uses CPU0 for housekeeping could be achieved by modifying their system, but it is preferable if nohz_full can support their environment without modification.
Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J . Wysocki <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
show more ...
|
|
Revision tags: v5.1-rc4, v5.1-rc3 |
|
| #
3f2552f7 |
| 29-Mar-2019 |
Chang-An Chen <[email protected]> |
timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM / sleep: Make it possible to quiesce timers durin
timers/sched_clock: Prevent generic sched_clock wrap caused by tick_freeze()
tick_freeze() introduced by suspend-to-idle in commit 124cf9117c5f ("PM / sleep: Make it possible to quiesce timers during suspend-to-idle") uses timekeeping_suspend() instead of syscore_suspend() during suspend-to-idle. As a consequence generic sched_clock will keep going because sched_clock_suspend() and sched_clock_resume() are not invoked during suspend-to-idle which can result in a generic sched_clock wrap.
On a ARM system with suspend-to-idle enabled, sched_clock is registered as "56 bits at 13MHz, resolution 76ns, wraps every 4398046511101ns", which means the real wrapping duration is 8796093022202ns.
[ 134.551779] suspend-to-idle suspend (timekeeping_suspend()) [ 1204.912239] suspend-to-idle resume (timekeeping_resume()) ...... [ 1206.912239] suspend-to-idle suspend (timekeeping_suspend()) [ 5880.502807] suspend-to-idle resume (timekeeping_resume()) ...... [ 6000.403724] suspend-to-idle suspend (timekeeping_suspend()) [ 8035.753167] suspend-to-idle resume (timekeeping_resume()) ...... [ 8795.786684] (2)[321:charger_thread]...... [ 8795.788387] (2)[321:charger_thread]...... [ 0.057226] (0)[0:swapper/0]...... [ 0.061447] (2)[0:swapper/2]......
sched_clock was not stopped during suspend-to-idle, and sched_clock_poll hrtimer was not expired because timekeeping_suspend() was invoked during suspend-to-idle. It makes sched_clock wrap at kernel time 8796s.
To prevent this, invoke sched_clock_suspend() and sched_clock_resume() in tick_freeze() together with timekeeping_suspend() and timekeeping_resume().
Fixes: 124cf9117c5f (PM / sleep: Make it possible to quiesce timers during suspend-to-idle) Signed-off-by: Chang-An Chen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Matthias Brugger <[email protected]> Cc: John Stultz <[email protected]> Cc: Kees Cook <[email protected]> Cc: Corey Minyard <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Stanley Chu <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.1-rc2 |
|
| #
e1e41b6c |
| 18-Mar-2019 |
Rasmus Villemoes <[email protected]> |
timekeeping: Consistently use unsigned int for seqcount snapshot
The timekeeping code uses a random mix of "unsigned long" and "unsigned int" for the seqcount snapshots (ratio 14:12). Since the seql
timekeeping: Consistently use unsigned int for seqcount snapshot
The timekeeping code uses a random mix of "unsigned long" and "unsigned int" for the seqcount snapshots (ratio 14:12). Since the seqlock.h API is entirely based on unsigned int, use that throughout.
Signed-off-by: Rasmus Villemoes <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: John Stultz <[email protected]> Cc: Stephen Boyd <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v5.1-rc1, v5.0, v5.0-rc8, v5.0-rc7, v5.0-rc6, v5.0-rc5, v5.0-rc4, v5.0-rc3, v5.0-rc2, v5.0-rc1, v4.20, v4.20-rc7, v4.20-rc6, v4.20-rc5, v4.20-rc4, v4.20-rc3, v4.20-rc2, v4.20-rc1 |
|
| #
f49c174b |
| 31-Oct-2018 |
Thomas Gleixner <[email protected]> |
hrtimers/tick/clockevents: Remove sloppy license references
"For licencing details see kernel-base/COPYING" and similar license references have no value over the SPDX identifier. Remove them.
Signe
hrtimers/tick/clockevents: Remove sloppy license references
"For licencing details see kernel-base/COPYING" and similar license references have no value over the SPDX identifier. Remove them.
Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Ingo Molnar <[email protected]> Acked-by: John Stultz <[email protected]> Acked-by: Corey Minyard <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Philippe Ombredanne <[email protected]> Cc: Peter Anvin <[email protected]> Cc: Russell King <[email protected]> Cc: Richard Cochran <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: David Riley <[email protected]> Cc: Colin Cross <[email protected]> Cc: Mark Brown <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
show more ...
|