History log of /linux-6.15/kernel/time/clocksource.c (Results 1 – 25 of 179)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1
# 8fa7292f 05-Apr-2025 Thomas Gleixner <[email protected]>

treewide: Switch/rename to timer_delete[_sync]()

timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with c

treewide: Switch/rename to timer_delete[_sync]()

timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.

Conversion was done with coccinelle plus manual fixups where necessary.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

show more ...


Revision tags: v6.14, v6.14-rc7
# fc661d0a 11-Mar-2025 Thorsten Blum <[email protected]>

clocksource: Remove unnecessary strscpy() size argument

The size argument of strscpy() is only required when the destination
pointer is not a fixed sized array or when the copy needs to be smaller
t

clocksource: Remove unnecessary strscpy() size argument

The size argument of strscpy() is only required when the destination
pointer is not a fixed sized array or when the copy needs to be smaller
than the size of the fixed sized destination array.

For fixed sized destination arrays and full copies, strscpy() automatically
determines the length of the destination buffer if the size argument is
omitted.

This makes the explicit sizeof() unnecessary. Remove it.

[ tglx: Massaged change log ]

Signed-off-by: Thorsten Blum <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1
# 6bb05a33 31-Jan-2025 Waiman Long <[email protected]>

clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context

The following bug report happened with a PREEMPT_RT kernel:

BUG: sleeping function called from invalid conte

clocksource: Use migrate_disable() to avoid calling get_random_u32() in atomic context

The following bug report happened with a PREEMPT_RT kernel:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2012, name: kwatchdog
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
get_random_u32+0x4f/0x110
clocksource_verify_choose_cpus+0xab/0x1a0
clocksource_verify_percpu.part.0+0x6b/0x330
clocksource_watchdog_kthread+0x193/0x1a0

It is due to the fact that clocksource_verify_choose_cpus() is invoked with
preemption disabled. This function invokes get_random_u32() to obtain
random numbers for choosing CPUs. The batched_entropy_32 local lock and/or
the base_crng.lock spinlock in driver/char/random.c will be acquired during
the call. In PREEMPT_RT kernel, they are both sleeping locks and so cannot
be acquired in atomic context.

Fix this problem by using migrate_disable() to allow smp_processor_id() to
be reliably used without introducing atomic context. preempt_disable() is
then called after clocksource_verify_choose_cpus() but before the
clocksource measurement is being run to avoid introducing unexpected
latency.

Fixes: 7560c02bdffb ("clocksource: Check per-CPU clock synchronization when marked unstable")
Suggested-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 1f566840 25-Jan-2025 Waiman Long <[email protected]>

clocksource: Use pr_info() for "Checking clocksource synchronization" message

The "Checking clocksource synchronization" message is normally printed
when clocksource_verify_percpu() is called for a

clocksource: Use pr_info() for "Checking clocksource synchronization" message

The "Checking clocksource synchronization" message is normally printed
when clocksource_verify_percpu() is called for a given clocksource if
both the CLOCK_SOURCE_UNSTABLE and CLOCK_SOURCE_VERIFY_PERCPU flags
are set.

It is an informational message and so pr_info() is the correct choice.

Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Acked-by: John Stultz <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2
# 76031d95 03-Dec-2024 Thomas Gleixner <[email protected]>

clocksource: Make negative motion detection more robust

Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.

It turns out that the calculated maximal

clocksource: Make negative motion detection more robust

Guenter reported boot stalls on a emulated ARM 32-bit platform, which has a
24-bit wide clocksource.

It turns out that the calculated maximal idle time, which limits idle
sleeps to prevent clocksource wrap arounds, is close to the point where the
negative motion detection triggers.

max_idle_ns: 597268854 ns
negative motion tripping point: 671088640 ns

If the idle wakeup is delayed beyond that point, the clocksource
advances far enough to trigger the negative motion detection. This
prevents the clock to advance and in the worst case the system stalls
completely if the consecutive sleeps based on the stale clock are
delayed as well.

Cure this by calculating a more robust cut-off value for negative motion,
which covers 87.5% of the actual clocksource counter width. Compare the
delta against this value to catch negative motion. This is specifically for
clock sources with a small counter width as their wrap around time is close
to the half counter width. For clock sources with wide counters this is not
a problem because the maximum idle time is far from the half counter width
due to the math overflow protection constraints.

For the case at hand this results in a tripping point of 1174405120ns.

Note, that this cannot prevent issues when the delay exceeds the 87.5%
margin, but that's not different from the previous unchecked version which
allowed arbitrary time jumps.

Systems with small counter width are prone to invalid results, but this
problem is unlikely to be seen on real hardware. If such a system
completely stalls for more than half a second, then there are other more
urgent problems than the counter wrapping around.

Fixes: c163e40af9b2 ("timekeeping: Always check for negative motion")
Reported-by: Guenter Roeck <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/all/8734j5ul4x.ffs@tglx
Closes: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3
# bafffd56 10-Oct-2024 Dr. David Alan Gilbert <[email protected]>

clocksource: Remove unused clocksource_change_rating

clocksource_change_rating() has been unused since 2017's commit
63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource c

clocksource: Remove unused clocksource_change_rating

clocksource_change_rating() has been unused since 2017's commit
63ed4e0c67df ("Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code")

Remove it.

__clocksource_change_rating now only has one use which is ifdef'd.
Move it into the ifdef'd section.

Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2
# 4ac1dd32 02-Aug-2024 Paul E. McKenney <[email protected]>

clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin

Right now, cs_watchdog_read() does clocksource sanity checks based
on WATCHDOG_MAX_SKEW, which sets a floor on any clocksource

clocksource: Set cs_watchdog_read() checks based on .uncertainty_margin

Right now, cs_watchdog_read() does clocksource sanity checks based
on WATCHDOG_MAX_SKEW, which sets a floor on any clocksource's
.uncertainty_margin. These sanity checks can therefore act
inappropriately for clocksources with large uncertainty margins.

One reason for a clocksource to have a large .uncertainty_margin is when
that clocksource has long read-out latency, given that it does not make
sense for the .uncertainty_margin to be smaller than the read-out latency.
With the current checks, cs_watchdog_read() could reject all normal
reads from a clocksource with long read-out latencies, such as those
from legacy clocksources that are no longer implemented in hardware.

Therefore, recast the cs_watchdog_read() checks in terms of the
.uncertainty_margin values of the clocksources involved in the timespan in
question. The first covers two watchdog reads and one cs read, so use
twice the watchdog .uncertainty_margin plus that of the cs. The second
covers only a pair of watchdog reads, so use twice the watchdog
.uncertainty_margin.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# f33a5d4b 02-Aug-2024 Paul E. McKenney <[email protected]>

clocksource: Fix comments on WATCHDOG_THRESHOLD & WATCHDOG_MAX_SKEW

The WATCHDOG_THRESHOLD macro is no longer used to supply a default value
for ->uncertainty_margin, but WATCHDOG_MAX_SKEW now is.

clocksource: Fix comments on WATCHDOG_THRESHOLD & WATCHDOG_MAX_SKEW

The WATCHDOG_THRESHOLD macro is no longer used to supply a default value
for ->uncertainty_margin, but WATCHDOG_MAX_SKEW now is.

Therefore, update the comments to reflect this change.

Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Borislav Petkov (AMD) <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# 17915131 02-Aug-2024 Borislav Petkov <[email protected]>

clocksource: Improve comments for watchdog skew bounds

Add more detail on the rationale for bounding the clocksource
->uncertainty_margin below at about 500ppm.

Signed-off-by: Borislav Petkov <bp@a

clocksource: Improve comments for watchdog skew bounds

Add more detail on the rationale for bounding the clocksource
->uncertainty_margin below at about 500ppm.

Signed-off-by: Borislav Petkov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/all/[email protected]

show more ...


# f2655ac2 02-Aug-2024 Paul E. McKenney <[email protected]>

clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()

The current "nretries > 1 || nretries >= max_retries" check in
cs_watchdog_read() will always evaluate to true, and thus pr_warn(), if

clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()

The current "nretries > 1 || nretries >= max_retries" check in
cs_watchdog_read() will always evaluate to true, and thus pr_warn(), if
nretries is greater than 1. The intent is instead to never warn on the
first try, but otherwise warn if the successful retry was the last retry.

Therefore, change that "||" to "&&".

Fixes: db3a34e17433 ("clocksource: Retry clock read if long delays detected")
Reported-by: Borislav Petkov <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/all/[email protected]

show more ...


Revision tags: v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# 8f0acb7f 14-Mar-2024 Li Zhijian <[email protected]>

clocksource: Convert s[n]printf() to sysfs_emit()

Per filesystems/sysfs.rst, show() should only use sysfs_emit() or
sysfs_emit_at() when formatting the value to be returned to user space.

coccinell

clocksource: Convert s[n]printf() to sysfs_emit()

Per filesystems/sysfs.rst, show() should only use sysfs_emit() or
sysfs_emit_at() when formatting the value to be returned to user space.

coccinelle complains that there are still a couple of functions that use
snprintf(). Convert them to sysfs_emit().

Signed-off-by: Li Zhijian <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# d0304569 25-Mar-2024 Adrian Hunter <[email protected]>

clocksource: Make watchdog and suspend-timing multiplication overflow safe

Kernel timekeeping is designed to keep the change in cycles (since the last
timer interrupt) below max_cycles, which preven

clocksource: Make watchdog and suspend-timing multiplication overflow safe

Kernel timekeeping is designed to keep the change in cycles (since the last
timer interrupt) below max_cycles, which prevents multiplication overflow
when converting cycles to nanoseconds. However, if timer interrupts stop,
the clocksource_cyc2ns() calculation will eventually overflow.

Add protection against that. Simplify by folding together
clocksource_delta() and clocksource_cyc2ns() into cycles_to_nsec_safe().
Check against max_cycles, falling back to a slower higher precision
calculation.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.8, v6.8-rc7, v6.8-rc6
# 2ed08e4b 21-Feb-2024 Feng Tang <[email protected]>

clocksource: Scale the watchdog read retries automatically

On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:

cloc

clocksource: Scale the watchdog read retries automatically

On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:

clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns,
wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable
tsc: Marking TSC unstable due to clocksource watchdog
TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152)
clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896.
clocksource: Switched to clocksource hpet

The reason is that for platform with a large number of CPUs, there are
sporadic big or huge read latencies while reading the watchog/clocksource
during boot or when system is under stress work load, and the frequency and
maximum value of the latency goes up with the number of online CPUs.

The cCurrent code already has logic to detect and filter such high latency
case by reading the watchdog twice and checking the two deltas. Due to the
randomness of the latency, there is a low probabilty that the first delta
(latency) is big, but the second delta is small and looks valid. The
watchdog code retries the readouts by default twice, which is not
necessarily sufficient for systems with a large number of CPUs.

There is a command line parameter 'max_cswd_read_retries' which allows to
increase the number of retries, but that's not user friendly as it needs to
be tweaked per system. As the number of required retries is proportional to
the number of online CPUs, this parameter can be calculated at runtime.

Scale and enlarge the number of retries according to the number of online
CPUs and remove the command line parameter completely.

[ tglx: Massaged change log and comments ]

Signed-off-by: Feng Tang <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Jin Wang <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Reviewed-by: Waiman Long <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.8-rc5, v6.8-rc4
# 2bc7fc24 04-Feb-2024 Ricardo B. Marliere <[email protected]>

clocksource: Make clocksource_subsys const

Now that the driver core can properly handle constant struct bus_type,
move the clocksource_subsys variable to be a constant structure as well,
placing it

clocksource: Make clocksource_subsys const

Now that the driver core can properly handle constant struct bus_type,
move the clocksource_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Suggested-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Ricardo B. Marliere <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Acked-by: John Stultz <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.8-rc3, v6.8-rc2
# 64464955 22-Jan-2024 Jiri Wiesner <[email protected]>

clocksource: Skip watchdog check for large watchdog intervals

There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:

clocksource: timekeeping watchdo

clocksource: Skip watchdog check for large watchdog intervals

There have been reports of the watchdog marking clocksources unstable on
machines with 8 NUMA nodes:

clocksource: timekeeping watchdog on CPU373:
Marking clocksource 'tsc' as unstable because the skew is too large:
clocksource: 'hpet' wd_nsec: 14523447520
clocksource: 'tsc' cs_nsec: 14524115132

The measured clocksource skew - the absolute difference between cs_nsec
and wd_nsec - was 668 microseconds:

cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612

The kernel used 200 microseconds for the uncertainty_margin of both the
clocksource and watchdog, resulting in a threshold of 400 microseconds (the
md variable). Both the cs_nsec and the wd_nsec value indicate that the
readout interval was circa 14.5 seconds. The observed behaviour is that
watchdog checks failed for large readout intervals on 8 NUMA node
machines. This indicates that the size of the skew was directly proportinal
to the length of the readout interval on those machines. The measured
clocksource skew, 668 microseconds, was evaluated against a threshold (the
md variable) that is suited for readout intervals of roughly
WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.

The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew
threshold") was to tighten the threshold for evaluating skew and set the
lower bound for the uncertainty_margin of clocksources to twice
WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource
watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to
125 microseconds to fit the limit of NTP, which is able to use a
clocksource that suffers from up to 500 microseconds of skew per second.
Both the TSC and the HPET use default uncertainty_margin. When the
readout interval gets stretched the default uncertainty_margin is no
longer a suitable lower bound for evaluating skew - it imposes a limit
that is far stricter than the skew with which NTP can deal.

The root causes of the skew being directly proportinal to the length of
the readout interval are:

* the inaccuracy of the shift/mult pairs of clocksources and the watchdog
* the conversion to nanoseconds is imprecise for large readout intervals

Prevent this by skipping the current watchdog check if the readout
interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout
interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin
(of the TSC and HPET) corresponds to a limit on clocksource skew of 250
ppm (microseconds of skew per second). To keep the limit imposed by NTP
(500 microseconds of skew per second) for all possible readout intervals,
the margins would have to be scaled so that the threshold value is
proportional to the length of the actual readout interval.

As for why the readout interval may get stretched: Since the watchdog is
executed in softirq context the expiration of the watchdog timer can get
severely delayed on account of a ksoftirqd thread not getting to run in a
timely manner. Surely, a system with such belated softirq execution is not
working well and the scheduling issue should be looked into but the
clocksource watchdog should be able to deal with it accordingly.

Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold")
Suggested-by: Feng Tang <[email protected]>
Signed-off-by: Jiri Wiesner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Reviewed-by: Feng Tang <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/20240122172350.GA740@incl

show more ...


Revision tags: v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6
# e40806e9 07-Jun-2023 Paul E. McKenney <[email protected]>

clocksource: Handle negative skews in "skew is too large" messages

The nanosecond-to-millisecond skew computation uses unsigned arithmetic,
which produces user-unfriendly large positive numbers for

clocksource: Handle negative skews in "skew is too large" messages

The nanosecond-to-millisecond skew computation uses unsigned arithmetic,
which produces user-unfriendly large positive numbers for negative skews.
Therefore, use signed arithmetic for this computation in order to preserve
the negativity.

Reported-by: Chris Bainbridge <[email protected]>
Reported-by: Feng Tang <[email protected]>
Fixes: dd029269947a ("clocksource: Improve "skew is too large" messages")
Reviewed-by: Feng Tang <[email protected]>
Tested-by: Chris Bainbridge <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.4-rc5
# 76edc27e 30-May-2023 Azeem Shaikh <[email protected]>

clocksource: Replace all non-returning strlcpy with strscpy

strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to

clocksource: Replace all non-returning strlcpy with strscpy

strlcpy() reads the entire source buffer first.
This read may exceed the destination size limit.
This is both inefficient and can lead to linear read
overflows if a source string is not NUL-terminated [1].
In an effort to remove strlcpy() completely [2], replace
strlcpy() here with strscpy().
No return values were used, so direct replacement is safe.

[1] https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
[2] https://github.com/KSPP/linux/issues/89

Signed-off-by: Azeem Shaikh <[email protected]>
Acked-by: John Stultz <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1
# b7082cdf 20-Dec-2022 Feng Tang <[email protected]>

clocksource: Suspend the watchdog temporarily when high read latency detected

Bugs have been reported on 8 sockets x86 machines in which the TSC was
wrongly disabled when the system is under heavy w

clocksource: Suspend the watchdog temporarily when high read latency detected

Bugs have been reported on 8 sockets x86 machines in which the TSC was
wrongly disabled when the system is under heavy workload.

[ 818.380354] clocksource: timekeeping watchdog on CPU336: hpet wd-wd read-back delay of 1203520ns
[ 818.436160] clocksource: wd-tsc-wd read-back delay of 181880ns, clock-skew test skipped!
[ 819.402962] clocksource: timekeeping watchdog on CPU338: hpet wd-wd read-back delay of 324000ns
[ 819.448036] clocksource: wd-tsc-wd read-back delay of 337240ns, clock-skew test skipped!
[ 819.880863] clocksource: timekeeping watchdog on CPU339: hpet read-back delay of 150280ns, attempt 3, marking unstable
[ 819.936243] tsc: Marking TSC unstable due to clocksource watchdog
[ 820.068173] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[ 820.092382] sched_clock: Marking unstable (818769414384, 1195404998)
[ 820.643627] clocksource: Checking clocksource tsc synchronization from CPU 267 to CPUs 0,4,25,70,126,430,557,564.
[ 821.067990] clocksource: Switched to clocksource hpet

This can be reproduced by running memory intensive 'stream' tests,
or some of the stress-ng subcases such as 'ioport'.

The reason for these issues is the when system is under heavy load, the
read latency of the clocksources can be very high. Even lightweight TSC
reads can show high latencies, and latencies are much worse for external
clocksources such as HPET or the APIC PM timer. These latencies can
result in false-positive clocksource-unstable determinations.

These issues were initially reported by a customer running on a production
system, and this problem was reproduced on several generations of Xeon
servers, especially when running the stress-ng test. These Xeon servers
were not production systems, but they did have the latest steppings
and firmware.

Given that the clocksource watchdog is a continual diagnostic check with
frequency of twice a second, there is no need to rush it when the system
is under heavy load. Therefore, when high clocksource read latencies
are detected, suspend the watchdog timer for 5 minutes.

Signed-off-by: Feng Tang <[email protected]>
Acked-by: Waiman Long <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Feng Tang <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


# dd029269 14-Dec-2022 Paul E. McKenney <[email protected]>

clocksource: Improve "skew is too large" messages

When clocksource_watchdog() detects excessive clocksource skew compared
to the watchdog clocksource, it marks the clocksource under test as
unstable

clocksource: Improve "skew is too large" messages

When clocksource_watchdog() detects excessive clocksource skew compared
to the watchdog clocksource, it marks the clocksource under test as
unstable and prints several lines worth of message. But that message
is unclear to anyone unfamiliar with the code:

clocksource: timekeeping watchdog on CPU2: Marking clocksource 'wdtest-ktime' as unstable because the skew is too large:
clocksource: 'kvm-clock' wd_nsec: 400744390 wd_now: 612625c2c wd_last: 5fa7f7c66 mask: ffffffffffffffff
clocksource: 'wdtest-ktime' cs_nsec: 600744034 cs_now: 173081397a292d4f cs_last: 17308139565a8ced mask: ffffffffffffffff
clocksource: 'kvm-clock' (not 'wdtest-ktime') is current clocksource.

Therefore, add the following line near the end of that message:

Clocksource 'wdtest-ktime' skewed 199999644 ns (199 ms) over watchdog 'kvm-clock' interval of 400744390 ns (400 ms)

This new line clearly indicates the amount of skew between the two
clocksources, along with the duration of the time interval over which
the skew occurred, both in nanoseconds and milliseconds.

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Feng Tang <[email protected]>

show more ...


# f092eb34 13-Dec-2022 Paul E. McKenney <[email protected]>

clocksource: Improve read-back-delay message

When cs_watchdog_read() is unable to get a qualifying clocksource read
within the limit set by max_cswd_read_retries, it prints a message
and marks the c

clocksource: Improve read-back-delay message

When cs_watchdog_read() is unable to get a qualifying clocksource read
within the limit set by max_cswd_read_retries, it prints a message
and marks the clocksource under test as unstable. But that message is
unclear to anyone unfamiliar with the code:

clocksource: timekeeping watchdog on CPU13: wd-tsc-wd read-back delay 1000614ns, attempt 3, marking unstable

Therefore, add some context so that the message appears as follows:

clocksource: timekeeping watchdog on CPU13: wd-tsc-wd excessive read-back delay of 1000614ns vs. limit of 125000ns, wd-wd read-back delay only 27ns, attempt 3, marking tsc unstable

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: John Stultz <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Feng Tang <[email protected]>

show more ...


Revision tags: v6.1
# c37e85c1 07-Dec-2022 Paul E. McKenney <[email protected]>

clocksource: Loosen clocksource watchdog constraints

Currently, MAX_SKEW_USEC is set to 100 microseconds, which has worked
reasonably well. However, NTP is willing to tolerate 500 microseconds
of s

clocksource: Loosen clocksource watchdog constraints

Currently, MAX_SKEW_USEC is set to 100 microseconds, which has worked
reasonably well. However, NTP is willing to tolerate 500 microseconds
of skew per second, and a clocksource that is good enough for NTP should
be good enough for the clocksource watchdog. The watchdog's skew is
controlled by MAX_SKEW_USEC and the CLOCKSOURCE_WATCHDOG_MAX_SKEW_US
Kconfig option. However, these values are doubled before being associated
with a clocksource's ->uncertainty_margin, and the ->uncertainty_margin
values of the pair of clocksource's being compared are summed before
checking against the skew.

Therefore, set both MAX_SKEW_USEC and the default for the
CLOCKSOURCE_WATCHDOG_MAX_SKEW_US Kconfig option to 125 microseconds of
skew per second, resulting in 500 microseconds of skew per second in
the clocksource watchdog's skew comparison.

Suggested-by Rik van Riel <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6
# beaa1ffe 16-Nov-2022 Yunying Sun <[email protected]>

clocksource: Print clocksource name when clocksource is tested unstable

Some "TSC fall back to HPET" messages appear on systems having more than
2 NUMA nodes:

clocksource: timekeeping watchdog on C

clocksource: Print clocksource name when clocksource is tested unstable

Some "TSC fall back to HPET" messages appear on systems having more than
2 NUMA nodes:

clocksource: timekeeping watchdog on CPU168: hpet read-back delay of 4296200ns, attempt 4, marking unstable

The "hpet" here is misleading the clocksource watchdog is really
doing repeated reads of "hpet" in order to check for unrelated delays.
Therefore, print the name of the clocksource under test, prefixed by
"wd-" and suffixed by "-wd", for example, "wd-tsc-wd".

Signed-off-by: Yunying Sun <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

show more ...


Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1
# 8032bf12 10-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-

treewide: use get_random_u32_below() instead of deprecated function

This is a simple mechanical transformation done by:

@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)

Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Reviewed-by: SeongJae Park <[email protected]> # for damon
Reviewed-by: Jason Gunthorpe <[email protected]> # for infiniband
Reviewed-by: Russell King (Oracle) <[email protected]> # for arm
Acked-by: Ulf Hansson <[email protected]> # for mmc
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


# 81895a65 05-Oct-2022 Jason A. Donenfeld <[email protected]>

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
t

treewide: use prandom_u32_max() when possible, part 1

Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u64;
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((u64)(E) * get_random_u32() >> 32)
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

- RAND = get_random_u32();
... when != RAND
- RAND %= (E);
+ RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % (value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

- (FUNC()@p & (LITERAL))
+ prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

{
- T VAR;
- VAR = (E);
- return VAR;
+ return E;
}

@drop_var@
type T;
identifier VAR;
@@

{
- T VAR;
... when != VAR
}

Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Yury Norov <[email protected]>
Reviewed-by: KP Singh <[email protected]>
Reviewed-by: Jan Kara <[email protected]> # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder <[email protected]> # for drbd
Acked-by: Jakub Kicinski <[email protected]>
Acked-by: Heiko Carstens <[email protected]> # for s390
Acked-by: Ulf Hansson <[email protected]> # for mmc
Acked-by: Darrick J. Wong <[email protected]> # for xfs
Signed-off-by: Jason A. Donenfeld <[email protected]>

show more ...


Revision tags: v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7, v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2
# 95e3a973 23-Jan-2022 Yury Norov <[email protected]>

clocksource: replace cpumask_weight with cpumask_empty in clocksource.c

clocksource_verify_percpu() calls cpumask_weight() to check if any bit of
a given cpumask is set. We can do it more efficientl

clocksource: replace cpumask_weight with cpumask_empty in clocksource.c

clocksource_verify_percpu() calls cpumask_weight() to check if any bit of
a given cpumask is set. We can do it more efficiently with cpumask_empty()
because cpumask_empty() stops traversing the cpumask as soon as it finds
first set bit, while cpumask_weight() counts all bits unconditionally.

Signed-off-by: Yury Norov <[email protected]>

show more ...


12345678