History log of /linux-6.15/include/linux/nmi.h (Results 1 – 25 of 86)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5
# d6834d9c 21-Oct-2024 Li Huafei <[email protected]>

watchdog/hardlockup/perf: Fix perf_event memory leak

During stress-testing, we found a kmemleak report for perf_event:

unreferenced object 0xff110001410a33e0 (size 1328):
comm "kworker/4:11",

watchdog/hardlockup/perf: Fix perf_event memory leak

During stress-testing, we found a kmemleak report for perf_event:

unreferenced object 0xff110001410a33e0 (size 1328):
comm "kworker/4:11", pid 288, jiffies 4294916004
hex dump (first 32 bytes):
b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de ...;....".......
f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff .3.A.....3.A....
backtrace (crc 24eb7b3a):
[<00000000e211b653>] kmem_cache_alloc_node_noprof+0x269/0x2e0
[<000000009d0985fa>] perf_event_alloc+0x5f/0xcf0
[<00000000084ad4a2>] perf_event_create_kernel_counter+0x38/0x1b0
[<00000000fde96401>] hardlockup_detector_event_create+0x50/0xe0
[<0000000051183158>] watchdog_hardlockup_enable+0x17/0x70
[<00000000ac89727f>] softlockup_start_fn+0x15/0x40
...

Our stress test includes CPU online and offline cycles, and updating the
watchdog configuration.

After reading the code, I found that there may be a race between cleaning up
perf_event after updating watchdog and disabling event when the CPU goes offline:

CPU0 CPU1 CPU2
(update watchdog) (hotplug offline CPU1)

... _cpu_down(CPU1)
cpus_read_lock() // waiting for cpu lock
softlockup_start_all
smp_call_on_cpu(CPU1)
softlockup_start_fn
...
watchdog_hardlockup_enable(CPU1)
perf create E1
watchdog_ev[CPU1] = E1
cpus_read_unlock()
cpus_write_lock()
cpuhp_kick_ap_work(CPU1)
cpuhp_thread_fun
...
watchdog_hardlockup_disable(CPU1)
watchdog_ev[CPU1] = NULL
dead_event[CPU1] = E1
__lockup_detector_cleanup
for each dead_events_mask
release each dead_event
/*
* CPU1 has not been added to
* dead_events_mask, then E1
* will not be released
*/
CPU1 -> dead_events_mask
cpumask_clear(&dead_events_mask)
// dead_events_mask is cleared, E1 is leaked

In this case, the leaked perf_event E1 matches the perf_event leak
reported by kmemleak. Due to the low probability of problem recurrence
(only reported once), I added some hack delays in the code:

static void __lockup_detector_reconfigure(void)
{
...
watchdog_hardlockup_start();
cpus_read_unlock();
+ mdelay(100);
/*
* Must be called outside the cpus locked section to prevent
* recursive locking in the perf code.
...
}

void watchdog_hardlockup_disable(unsigned int cpu)
{
...
perf_event_disable(event);
this_cpu_write(watchdog_ev, NULL);
this_cpu_write(dead_event, event);
+ mdelay(100);
cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
atomic_dec(&watchdog_cpus);
...
}

void hardlockup_detector_perf_cleanup(void)
{
...
perf_event_release_kernel(event);
per_cpu(dead_event, cpu) = NULL;
}
+ mdelay(100);
cpumask_clear(&dead_events_mask);
}

Then, simultaneously performing CPU on/off and switching watchdog, it is
almost certain to reproduce this leak.

The problem here is that releasing perf_event is not within the CPU
hotplug read-write lock. Commit:

941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")

introduced deferred release to solve the deadlock caused by calling
get_online_cpus() when releasing perf_event. Later, commit:

efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock")

removed the get_online_cpus() call on the perf_event release path to solve
another deadlock problem.

Therefore, it is now possible to move the release of perf_event back
into the CPU hotplug read-write lock, and release the event immediately
after disabling it.

Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")
Signed-off-by: Li Huafei <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1, v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7
# 393fb313 30-Apr-2024 Song Liu <[email protected]>

watchdog: allow nmi watchdog to use raw perf event

NMI watchdog permanently consumes one hardware counters per CPU on the
system. For systems that use many hardware counters, this causes more
aggre

watchdog: allow nmi watchdog to use raw perf event

NMI watchdog permanently consumes one hardware counters per CPU on the
system. For systems that use many hardware counters, this causes more
aggressive time multiplexing of perf events.

OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
used. Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog
to use raw event. For example, on Intel CPUs, we can use "r300" to
configure the watchdog to use ref-cycles event.

If the raw event does not work, fall back to use "cycles".

[[email protected]: fix kerneldoc]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Song Liu <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8
# 75060b6e 06-Mar-2024 Thomas Weißschuh <[email protected]>

watchdog/core: remove sysctl handlers from public header

The functions are only used in the file where they are defined. Remove
them from the header and make them static.

Also guard proc_soft_watc

watchdog/core: remove sysctl handlers from public header

The functions are only used in the file where they are defined. Remove
them from the header and make them static.

Also guard proc_soft_watchdog with a #define-guard as it is not used
otherwise.

Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net
Signed-off-by: Thomas Weißschuh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5
# 8d539b84 04-Aug-2023 Douglas Anderson <[email protected]>

nmi_backtrace: allow excluding an arbitrary CPU

The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU. This convenience means callers didn't need to
find a pl

nmi_backtrace: allow excluding an arbitrary CPU

The APIs that allow backtracing across CPUs have always had a way to
exclude the current CPU. This convenience means callers didn't need to
find a place to allocate a CPU mask just to handle the common case.

Let's extend the API to take a CPU ID to exclude instead of just a
boolean. This isn't any more complex for the API to handle and allows the
hardlockup detector to exclude a different CPU (the one it already did a
trace for) without needing to find space for a CPU mask.

Arguably, this new API also encourages safer behavior. Specifically if
the caller wants to avoid tracing the current CPU (maybe because they
already traced the current CPU) this makes it more obvious to the caller
that they need to make sure that the current CPU ID can't change.

[[email protected]: fix trigger_allbutcpu_cpu_backtrace() stub]
Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: kernel test robot <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7
# 7ca8fe94 16-Jun-2023 Petr Mladek <[email protected]>

watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH

The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix.
It will be set

watchdog/hardlockup: define HARDLOCKUP_DETECTOR_ARCH

The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_ARCH without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

The change allows to clean up dependencies of PPC_WATCHDOG
and HAVE_HARDLOCKUP_DETECTOR_PERF definitions for powerpc.

As a result HAVE_HARDLOCKUP_DETECTOR_PERF has the same dependencies
on arm, x86, powerpc architectures.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 47f4cb43 16-Jun-2023 Petr Mladek <[email protected]>

watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64

The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be

watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64

The HAVE_ prefix means that the code could be enabled. Add another
variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix.
It will be set when it should be built. It will make it compatible
with the other hardlockup detectors.

Before, it is far from obvious that the SPARC64 variant is actually used:

$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y

After, it is more clear:

$> make ARCH=sparc64 defconfig
$> grep HARDLOCKUP_DETECTOR .config
CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# a5fcc236 16-Jun-2023 Petr Mladek <[email protected]>

watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific

There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.

C

watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific

There are several hardlockup detector implementations and several Kconfig
values which allow selection and build of the preferred one.

CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb
("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36.
It was a preparation step for introducing the new generic perf hardlockup
detector.

The existing arch-specific variants did not support the to-be-created
generic build configurations, sysctl interface, etc. This distinction
was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog:
Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR")
in v2.6.38.

CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105
("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced
the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used
by three architectures, namely blackfin, mn10300, and sparc.

The support for blackfin and mn10300 architectures has been completely
dropped some time ago. And sparc is the only architecture with the historic
NMI watchdog at the moment.

And the old sparc implementation is really special. It is always built on
sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b
("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added
in v4.10-rc1.

There are only few locations where the sparc64 NMI watchdog interacts
with the generic hardlockup detectors code:

+ implements arch_touch_nmi_watchdog() which is called from the generic
touch_nmi_watchdog()

+ implements watchdog_hardlockup_enable()/disable() to support
/proc/sys/kernel/nmi_watchdog

+ is always preferred over other generic watchdogs, see
CONFIG_HARDLOCKUP_DETECTOR

+ includes asm/nmi.h into linux/nmi.h because some sparc-specific
functions are needed in sparc-specific code which includes
only linux/nmi.h.

The situation became more complicated after the commit 05a4a95279311c3
("kernel/watchdog: split up config options") and commit 2104180a53698df5
("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1.
They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc
specific hardlockup detector. It was compatible with the perf one
regarding the general boot, sysctl, and programming interfaces.

HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of
HAVE_NMI_WATCHDOG. It made some sense because all arch-specific
detectors had some common requirements, namely:

+ implemented arch_touch_nmi_watchdog()
+ included asm/nmi.h into linux/nmi.h
+ defined the default value for /proc/sys/kernel/nmi_watchdog

But it actually has made things pretty complicated when the generic
buddy hardlockup detector was added. Before the generic perf detector
was newer supported together with an arch-specific one. But the buddy
detector could work on any SMP system. It means that an architecture
could support both the arch-specific and buddy detector.

As a result, there are few tricky dependencies. For example,
CONFIG_HARDLOCKUP_DETECTOR depends on:

((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH

The problem is that the very special sparc implementation is defined as:

HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH

Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear
without reading understanding the history.

Make the logic less tricky and more self-explanatory by making
HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to
HAVE_HARDLOCKUP_DETECTOR_SPARC64.

Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF,
and HARDLOCKUP_DETECTOR_BUDDY may conflict only with
HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR
and it is not longer enabled when HAVE_NMI_WATCHDOG is set.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 0c68bda6 16-Jun-2023 Petr Mladek <[email protected]>

watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h

arch_touch_nmi_watchdog() needs a different implementation for various
hardlockup detector implementations. And it does not

watchdog/hardlockup: declare arch_touch_nmi_watchdog() only in linux/nmi.h

arch_touch_nmi_watchdog() needs a different implementation for various
hardlockup detector implementations. And it does nothing when
any hardlockup detector is not built at all.

arch_touch_nmi_watchdog() is declared via linux/nmi.h. And it must be
defined as an empty function when there is no hardlockup detector.
It is done directly in this header file for the perf and buddy detectors.
And it is done in the included asm/linux.h for arch specific detectors.

The reason probably is that the arch specific variants build the code
using another conditions. For example, powerpc64/sparc64 builds the code
when CONFIG_PPC_WATCHDOG is enabled.

Another reason might be that these architectures define more functions
in asm/nmi.h anyway.

However the generic code actually knows when the function will be
implemented. It happens when some full featured or the sparc64-specific
hardlockup detector is built.

In particular, CONFIG_HARDLOCKUP_DETECTOR can be enabled only when
a generic or arch-specific full featured hardlockup detector is available.
The only exception is sparc64 which can be built even when the global
HARDLOCKUP_DETECTOR switch is disabled.

The information about sparc64 is a bit complicated. The hardlockup
detector is built there when CONFIG_HAVE_NMI_WATCHDOG is set and
CONFIG_HAVE_HARDLOCKUP_DETECTOR_ARCH is not set.

People might wonder whether this change really makes things easier.
The motivation is:

+ The current logic in linux/nmi.h is far from obvious.
For example, arch_touch_nmi_watchdog() is defined as {} when
neither CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER nor
CONFIG_HAVE_NMI_WATCHDOG is defined.

+ The change synchronizes the checks in lib/Kconfig.debug and
in the generic code.

+ It is a step that will help cleaning HAVE_NMI_WATCHDOG related
checks.

The change should not change the existing behavior.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Mladek <[email protected]>
Reviewed-by: Douglas Anderson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4
# d3b62ace 27-May-2023 Douglas Anderson <[email protected]>

watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called

In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchd

watchdog/buddy: cleanup how watchdog_buddy_check_hardlockup() is called

In the patch ("watchdog/hardlockup: detect hard lockups using secondary
(buddy) CPUs"), we added a call from the common watchdog.c file into the
buddy. That call could be done more cleanly. Specifically:

1. If we move the call into watchdog_hardlockup_kick() then it keeps
watchdog_timer_fn() simpler.
2. We don't need to pass an "unsigned long" to the buddy for the timer
count. In the patch ("watchdog/hardlockup: add a "cpu" param to
watchdog_hardlockup_check()") the count was changed to "atomic_t"
which is backed by an int, so we should match types.

Link: https://lkml.kernel.org/r/20230526184139.6.I006c7d958a1ea5c4e1e4dc44a25596d9bb5fd3ba@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 05e7b558 27-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()

In the patch ("watchdog/hardlockup: add comments to touch_nmi_watchdog()")
we adjusted some comments for touch_nmi_watchdog().

watchdog/hardlockup: remove softlockup comment in touch_nmi_watchdog()

In the patch ("watchdog/hardlockup: add comments to touch_nmi_watchdog()")
we adjusted some comments for touch_nmi_watchdog(). The comment about the
softlockup had a typo and were also felt to be too obvious. Remove it.

Link: https://lkml.kernel.org/r/20230526184139.5.Ia593afc9eb12082d55ea6681dc2c5a89677f20a8@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 9ec272c5 27-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails

Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".

This patch series attempts to finish resolving th

watchdog/hardlockup: keep kernel.nmi_watchdog sysctl as 0444 if probe fails

Patch series "watchdog: Cleanup / fixes after buddy series v5 reviews".

This patch series attempts to finish resolving the feedback received
from Petr Mladek on the v5 series I posted.

Probably the only thing that wasn't fully as clean as Petr requested was
the Kconfig stuff. I couldn't find a better way to express it without a
more major overhaul. In the very least, I renamed "NON_ARCH" to
"PERF_OR_BUDDY" in the hopes that will make it marginally better.

Nothing in this series is terribly critical and even the bugfixes are
small. However, it does cleanup a few things that were pointed out in
review.


This patch (of 10):

The permissions for the kernel.nmi_watchdog sysctl have always been set at
compile time despite the fact that a watchdog can fail to probe. Let's
fix this and set the permissions based on whether the hardlockup detector
actually probed.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/20230526184139.1.I0d75971cc52a7283f495aac0bd5c3041aadc734e@changeid
Fixes: a994a3147e4c ("watchdog/hardlockup/perf: Implement init time detection of perf")
Signed-off-by: Douglas Anderson <[email protected]>
Reported-by: Petr Mladek <[email protected]>
Closes: https://lore.kernel.org/r/ZHCn4hNxFpY5-9Ki@alley
Reviewed-by: Petr Mladek <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc3
# 930d8f8d 19-May-2023 Lecopzer Chen <[email protected]>

watchdog/perf: adapt the watchdog_perf interface for async model

When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not
ready yet. E.g. on arm64, PMU is not ready until
device_in

watchdog/perf: adapt the watchdog_perf interface for async model

When lockup_detector_init()->watchdog_hardlockup_probe(), PMU may be not
ready yet. E.g. on arm64, PMU is not ready until
device_initcall(armv8_pmu_driver_init). And it is deeply integrated with
the driver model and cpuhp. Hence it is hard to push this initialization
before smp_init().

But it is easy to take an opposite approach and try to initialize the
watchdog once again later. The delayed probe is called using workqueues.
It need to allocate memory and must be proceed in a normal context. The
delayed probe is able to use if watchdog_hardlockup_probe() returns
non-zero which means the return code returned when PMU is not ready yet.

Provide an API - lockup_detector_retry_init() for anyone who needs to
delayed init lockup detector if they had ever failed at
lockup_detector_init().

The original assumption is: nobody should use delayed probe after
lockup_detector_check() which has __init attribute. That is, anyone uses
this API must call between lockup_detector_init() and
lockup_detector_check(), and the caller must have __init attribute

Link: https://lkml.kernel.org/r/20230519101840.v5.16.If4ad5dd5d09fb1309cebf8bcead4b6a5a7758ca7@changeid
Reviewed-by: Petr Mladek <[email protected]>
Co-developed-by: Pingfan Liu <[email protected]>
Signed-off-by: Pingfan Liu <[email protected]>
Signed-off-by: Lecopzer Chen <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# b17aa959 19-May-2023 Douglas Anderson <[email protected]>

watchdog/perf: add a weak function for an arch to detect if perf can use NMIs

On arm64, NMI support needs to be detected at runtime. Add a weak
function to the perf hardlockup detector so that an a

watchdog/perf: add a weak function for an arch to detect if perf can use NMIs

On arm64, NMI support needs to be detected at runtime. Add a weak
function to the perf hardlockup detector so that an architecture can
implement it to detect whether NMIs are available.

Link: https://lkml.kernel.org/r/20230519101840.v5.15.Ic55cb6f90ef5967d8aaa2b503a4e67c753f64d3a@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 1f423c90 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs

Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of us

watchdog/hardlockup: detect hard lockups using secondary (buddy) CPUs

Implement a hardlockup detector that doesn't doesn't need any extra
arch-specific support code to detect lockups. Instead of using something
arch-specific we will use the buddy system, where each CPU watches out for
another one. Specifically, each CPU will use its softlockup hrtimer to
check that the next CPU is processing hrtimer interrupts by verifying that
a counter is increasing.

NOTE: unlike the other hard lockup detectors, the buddy one can't easily
show what's happening on the CPU that locked up just by doing a simple
backtrace. It relies on some other mechanism in the system to get
information about the locked up CPUs. This could be support for NMI
backtraces like [1], it could be a mechanism for printing the PC of locked
CPUs at panic time like [2] / [3], or it could be something else. Even
though that means we still rely on arch-specific code, this arch-specific
code seems to often be implemented even on architectures that don't have a
hardlockup detector.

This style of hardlockup detector originated in some downstream Android
trees and has been rebased on / carried in ChromeOS trees for quite a long
time for use on arm and arm64 boards. Historically on these boards we've
leveraged mechanism [2] / [3] to get information about hung CPUs, but we
could move to [1].

Although the original motivation for the buddy system was for use on
systems without an arch-specific hardlockup detector, it can still be
useful to use even on systems that _do_ have an arch-specific hardlockup
detector. On x86, for instance, there is a 24-part patch series [4] in
progress switching the arch-specific hard lockup detector from a scarce
perf counter to a less-scarce hardware resource. Potentially the buddy
system could be a simpler alternative to free up the perf counter but
still get hard lockup detection.

Overall, pros (+) and cons (-) of the buddy system compared to an
arch-specific hardlockup detector (which might be implemented using
perf):
+ The buddy system is usable on systems that don't have an
arch-specific hardlockup detector, like arm32 and arm64 (though it's
being worked on for arm64 [5]).
+ The buddy system may free up scarce hardware resources.
+ If a CPU totally goes out to lunch (can't process NMIs) the buddy
system could still detect the problem (though it would be unlikely
to be able to get a stack trace).
+ The buddy system uses the same timer function to pet the hardlockup
detector on the running CPU as it uses to detect hardlockups on
other CPUs. Compared to other hardlockup detectors, this means it
generates fewer interrupts and thus is likely better able to let
CPUs stay idle longer.
- If all CPUs are hard locked up at the same time the buddy system
can't detect it.
- If we don't have SMP we can't use the buddy system.
- The buddy system needs an arch-specific mechanism (possibly NMI
backtrace) to get info about the locked up CPU.

[1] https://lore.kernel.org/r/[email protected]
[2] https://issuetracker.google.com/172213129
[3] https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
[4] https://lore.kernel.org/lkml/[email protected]/
[5] https://lore.kernel.org/linux-arm-kernel/[email protected]/

Link: https://lkml.kernel.org/r/20230519101840.v5.14.I6bf789d21d0c3d75d382e7e51a804a7a51315f2c@changeid
Signed-off-by: Colin Cross <[email protected]>
Signed-off-by: Matthias Kaehlcke <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Tzung-Bi Shih <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# d9b3629a 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly

The fact that there watchdog_hardlockup_enable(),
watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
d

watchdog/hardlockup: have the perf hardlockup use __weak functions more cleanly

The fact that there watchdog_hardlockup_enable(),
watchdog_hardlockup_disable(), and watchdog_hardlockup_probe() are
declared __weak means that the configured hardlockup detector can define
non-weak versions of those functions if it needs to. Instead of doing
this, the perf hardlockup detector hooked itself into the default __weak
implementation, which was a bit awkward. Clean this up.

From comments, it looks as if the original design was done because the
__weak function were expected to implemented by the architecture and not
by the configured hardlockup detector. This got awkward when we tried to
add the buddy lockup detector which was not arch-specific but wanted to
hook into those same functions.

This is not expected to have any functional impact.

Link: https://lkml.kernel.org/r/20230519101840.v5.13.I847d9ec852449350997ba00401d2462a9cb4302b@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# df95d308 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: rename some "NMI watchdog" constants/function

Do a search and replace of:
- NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
- SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENA

watchdog/hardlockup: rename some "NMI watchdog" constants/function

Do a search and replace of:
- NMI_WATCHDOG_ENABLED => WATCHDOG_HARDLOCKUP_ENABLED
- SOFT_WATCHDOG_ENABLED => WATCHDOG_SOFTOCKUP_ENABLED
- watchdog_nmi_ => watchdog_hardlockup_
- nmi_watchdog_available => watchdog_hardlockup_available
- nmi_watchdog_user_enabled => watchdog_hardlockup_user_enabled
- soft_watchdog_user_enabled => watchdog_softlockup_user_enabled
- NMI_WATCHDOG_DEFAULT => WATCHDOG_HARDLOCKUP_DEFAULT

Then update a few comments near where names were changed.

This is specifically to make it less confusing when we want to introduce
the buddy hardlockup detector, which isn't using NMIs. As part of this,
we sanitized a few names for consistency.

[[email protected]: make variables static]
Link: https://lkml.kernel.org/r/20230525162822.1.I0fb41d138d158c9230573eaa37dc56afa2fb14ee@changeid
Link: https://lkml.kernel.org/r/20230519101840.v5.12.I91f7277bab4bf8c0cb238732ed92e7ce7bbd71a6@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Tom Rix <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# ed92e1ef 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c

In preparation for the buddy hardlockup detector, which wants the same
petting logic as the current perf hardlockup detector,

watchdog/hardlockup: move perf hardlockup watchdog petting to watchdog.c

In preparation for the buddy hardlockup detector, which wants the same
petting logic as the current perf hardlockup detector, move the code to
watchdog.c. While doing this, rename the global variable to match others
nearby. As part of this change we have to change the code to account for
the fact that the CPU we're running on might be different than the one
we're checking.

Currently the code in watchdog.c is guarded by
CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem silly.
However, a future patch will change this.

Link: https://lkml.kernel.org/r/20230519101840.v5.11.I00dfd6386ee00da25bf26d140559a41339b53e57@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 77c12fc9 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()

In preparation for the buddy hardlockup detector where the CPU checking
for lockup might not be the currently running CPU, add a

watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()

In preparation for the buddy hardlockup detector where the CPU checking
for lockup might not be the currently running CPU, add a "cpu" parameter
to watchdog_hardlockup_check().

As part of this change, make hrtimer_interrupts an atomic_t since now the
CPU incrementing the value and the CPU reading the value might be
different. Technially this could also be done with just READ_ONCE and
WRITE_ONCE, but atomic_t feels a little cleaner in this case.

While hrtimer_interrupts is made atomic_t, we change
hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
needed to match the data type backing atomic_t for hrtimer_interrupts.
Even if this changes us from 64-bits to 32-bits (which I don't think is
true for most compilers), it doesn't really matter. All we ever do is
increment it every few seconds and compare it to an old value so 32-bits
is fine (even 16-bits would be). The "signed" vs "unsigned" also doesn't
matter for simple equality comparisons.

hrtimer_interrupts_saved is _not_ switched to atomic_t nor even accessed
with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is always
consistently accessed with the same CPU. NOTE: with the upcoming "buddy"
detector there is one special case. When a CPU goes offline/online then
we can change which CPU is the one to consistently access a given instance
of hrtimer_interrupts_saved. We still can't end up with a partially
updated hrtimer_interrupts_saved, however, because we end up petting all
affected CPUs to make sure the new and old CPU can't end up somehow
read/write hrtimer_interrupts_saved at the same time.

Link: https://lkml.kernel.org/r/20230519101840.v5.10.I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 81972551 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c

The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interr

watchdog/hardlockup: move perf hardlockup checking/panic to common watchdog.c

The perf hardlockup detector works by looking at interrupt counts and
seeing if they change from run to run. The interrupt counts are managed
by the common watchdog code via its watchdog_timer_fn().

Currently the API between the perf detector and the common code is a
function: is_hardlockup(). When the hard lockup detector sees that
function return true then it handles printing out debug info and inducing
a panic if necessary.

Let's change the API a little bit in preparation for the buddy hardlockup
detector. The buddy hardlockup detector wants to print nearly the same
debug info and have nearly the same panic behavior. That means we want to
move all that code to the common file. For now, the code in the common
file will only be there if the perf hardlockup detector is enabled, but
eventually it will be selected by a common config.

Right now, this _just_ moves the code from the perf detector file to the
common file and changes the names. It doesn't make the changes that the
buddy hardlockup detector will need and doesn't do any style cleanups. A
future patch will do cleanup to make it more obvious what changed.

With the above, we no longer have any callers of is_hardlockup() outside
of the "watchdog.c" file, so we can remove it from the header, make it
static, and move it to the same "#ifdef" block as our new
watchdog_hardlockup_check(). While doing this, it can be noted that even
if no hardlockup detectors were configured the existing code used to still
have the code for counting/checking "hrtimer_interrupts" even if the perf
hardlockup detector wasn't configured. We didn't need to do that, so move
all the "hrtimer_interrupts" counting to only be there if the perf
hardlockup detector is configured as well.

This change is expected to be a no-op.

Link: https://lkml.kernel.org/r/20230519101840.v5.8.Id4133d3183e798122dc3b6205e7852601f289071@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 8b5c59a9 19-May-2023 Douglas Anderson <[email protected]>

watchdog/hardlockup: add comments to touch_nmi_watchdog()

In preparation for the buddy hardlockup detector, add comments to
touch_nmi_watchdog() to make it obvious that it touches the configured
har

watchdog/hardlockup: add comments to touch_nmi_watchdog()

In preparation for the buddy hardlockup detector, add comments to
touch_nmi_watchdog() to make it obvious that it touches the configured
hardlockup detector regardless of whether it's backed by an NMI. Also
note that arch_touch_nmi_watchdog() may not be architecture-specific.

Ideally, we'd like to rename these functions but that is a fairly
disruptive change touching a lot of drivers. After discussion [1] the
plan is to defer this until a good time.

[1] https://lore.kernel.org/r/ZFy0TX1tfhlH8gxj@alley

[[email protected]: comment changes, per Petr]
Link: https://lkml.kernel.org/r/ZGyONWPXpE1DcxA5@alley
Link: https://lkml.kernel.org/r/20230519101840.v5.6.I4e47cbfa1bb2ebbcdb5ca16817aa2887f15dc82c@changeid
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 73021118 19-May-2023 Lecopzer Chen <[email protected]>

watchdog/hardlockup: change watchdog_nmi_enable() to void

Nobody cares about the return value of watchdog_nmi_enable(), changing its
prototype to void.

Link: https://lkml.kernel.org/r/2023051910184

watchdog/hardlockup: change watchdog_nmi_enable() to void

Nobody cares about the return value of watchdog_nmi_enable(), changing its
prototype to void.

Link: https://lkml.kernel.org/r/20230519101840.v5.4.Ic3a19b592eb1ac4c6f6eade44ffd943e8637b6e5@changeid
Signed-off-by: Pingfan Liu <[email protected]>
Signed-off-by: Lecopzer Chen <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Acked-by: Nicholas Piggin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Colin Cross <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 5e008df1 19-May-2023 Douglas Anderson <[email protected]>

watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config

Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5.

This patch series adds the "buddy" hardl

watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config

Patch series "watchdog/hardlockup: Add the buddy hardlockup detector", v5.

This patch series adds the "buddy" hardlockup detector. In brief, the
buddy hardlockup detector can detect hardlockups without arch-level
support by having CPUs checkup on a "buddy" CPU periodically.

Given the new design of this patch series, testing all combinations is
fairly difficult. I've attempted to make sure that all combinations of
CONFIG_ options are good, but it wouldn't surprise me if I missed
something. I apologize in advance and I'll do my best to fix any
problems that are found.


This patch (of 18):

The real watchdog_update_hrtimer_threshold() is defined in
kernel/watchdog_hld.c. That file is included if
CONFIG_HARDLOCKUP_DETECTOR_PERF and the function is defined in that file
if CONFIG_HARDLOCKUP_CHECK_TIMESTAMP.

The dummy version of the function in "nmi.h" didn't get that quite right.
While this doesn't appear to be a huge deal, it's nice to make it
consistent.

It doesn't break builds because CHECK_TIMESTAMP is only defined by x86 so
others don't get a double definition, and x86 uses perf lockup detector,
so it gets the out of line version.

Link: https://lkml.kernel.org/r/20230519101840.v5.18.Ia44852044cdcb074f387e80df6b45e892965d4a1@changeid
Link: https://lkml.kernel.org/r/20230519101840.v5.1.I8cbb2f4fa740528fcfade4f5439b6cdcdd059251@changeid
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen-Yu Tsai <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Daniel Thompson <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Guenter Roeck <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Lecopzer Chen <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masayoshi Mizuma <[email protected]>
Cc: Matthias Kaehlcke <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Pingfan Liu <[email protected]>
Cc: Randy Dunlap <[email protected]>
Cc: "Ravi V. Shankar" <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Tzung-Bi Shih <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Colin Cross <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3, v6.2-rc2, v6.2-rc1
# 344da544 17-Dec-2022 Paul E. McKenney <[email protected]>

x86/nmi: Print reasons why backtrace NMIs are ignored

Instrument nmi_trigger_cpumask_backtrace() to dump out diagnostics based
on evidence accumulated by exc_nmi(). These diagnostics are dumped for

x86/nmi: Print reasons why backtrace NMIs are ignored

Instrument nmi_trigger_cpumask_backtrace() to dump out diagnostics based
on evidence accumulated by exc_nmi(). These diagnostics are dumped for
CPUs that ignored an NMI backtrace request for more than 10 seconds.

[ paulmck: Apply Ingo Molnar feedback. ]

Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>

show more ...


Revision tags: v6.1, v6.1-rc8, v6.1-rc7, v6.1-rc6, v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4, v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7
# 7c56a873 13-Jul-2022 Laurent Dufour <[email protected]>

watchdog: export lockup_detector_reconfigure

In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR mig

watchdog: export lockup_detector_reconfigure

In some circumstances it may be interesting to reconfigure the watchdog
from inside the kernel.

On PowerPC, this may helpful before and after a LPAR migration (LPM) is
initiated, because it implies some latencies, watchdog, and especially NMI
watchdog is expected to be triggered during this operation. Reconfiguring
the watchdog with a factor, would prevent it to happen too frequently
during LPM.

Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and
create a new function lockup_detector_reconfigure() calling
__lockup_detector_reconfigure() under the protection of watchdog_mutex.

Signed-off-by: Laurent Dufour <[email protected]>
[mpe: Squash in build fix from Laurent, reported by Sachin]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v5.19-rc6, v5.19-rc5, v5.19-rc4, v5.19-rc3, v5.19-rc2, v5.19-rc1, v5.18, v5.18-rc7, v5.18-rc6, v5.18-rc5, v5.18-rc4, v5.18-rc3, v5.18-rc2, v5.18-rc1, v5.17, v5.17-rc8, v5.17-rc7, v5.17-rc6, v5.17-rc5, v5.17-rc4, v5.17-rc3, v5.17-rc2, v5.17-rc1, v5.16, v5.16-rc8, v5.16-rc7, v5.16-rc6, v5.16-rc5, v5.16-rc4, v5.16-rc3, v5.16-rc2, v5.16-rc1, v5.15, v5.15-rc7, v5.15-rc6, v5.15-rc5, v5.15-rc4, v5.15-rc3, v5.15-rc2, v5.15-rc1, v5.14, v5.14-rc7, v5.14-rc6, v5.14-rc5, v5.14-rc4, v5.14-rc3, v5.14-rc2, v5.14-rc1, v5.13, v5.13-rc7, v5.13-rc6, v5.13-rc5, v5.13-rc4, v5.13-rc3, v5.13-rc2, v5.13-rc1, v5.12, v5.12-rc8, v5.12-rc7, v5.12-rc6, v5.12-rc5, v5.12-rc4, v5.12-rc3, v5.12-rc2, v5.12-rc1, v5.12-rc1-dontuse, v5.11, v5.11-rc7, v5.11-rc6, v5.11-rc5, v5.11-rc4, v5.11-rc3, v5.11-rc2, v5.11-rc1, v5.10, v5.10-rc7, v5.10-rc6, v5.10-rc5, v5.10-rc4, v5.10-rc3, v5.10-rc2, v5.10-rc1, v5.9, v5.9-rc8, v5.9-rc7, v5.9-rc6, v5.9-rc5, v5.9-rc4, v5.9-rc3, v5.9-rc2, v5.9-rc1, v5.8, v5.8-rc7, v5.8-rc6, v5.8-rc5, v5.8-rc4, v5.8-rc3, v5.8-rc2, v5.8-rc1, v5.7, v5.7-rc7, v5.7-rc6, v5.7-rc5, v5.7-rc4, v5.7-rc3
# 32927393 24-Apr-2020 Christoph Hellwig <[email protected]>

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from u

sysctl: pass kernel pointers to ->proc_handler

Instead of having all the sysctl handlers deal with user pointers, which
is rather hairy in terms of the BPF interaction, copy the input to and
from userspace in common code. This also means that the strings are
always NUL-terminated by the common code, making the API a little bit
safer.

As most handler just pass through the data to one of the common handlers
a lot of the changes are mechnical.

Signed-off-by: Christoph Hellwig <[email protected]>
Acked-by: Andrey Ignatov <[email protected]>
Signed-off-by: Al Viro <[email protected]>

show more ...


1234