|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
d6834d9c |
| 21-Oct-2024 |
Li Huafei <[email protected]> |
watchdog/hardlockup/perf: Fix perf_event memory leak
During stress-testing, we found a kmemleak report for perf_event:
unreferenced object 0xff110001410a33e0 (size 1328): comm "kworker/4:11",
watchdog/hardlockup/perf: Fix perf_event memory leak
During stress-testing, we found a kmemleak report for perf_event:
unreferenced object 0xff110001410a33e0 (size 1328): comm "kworker/4:11", pid 288, jiffies 4294916004 hex dump (first 32 bytes): b8 be c2 3b 02 00 11 ff 22 01 00 00 00 00 ad de ...;...."....... f0 33 0a 41 01 00 11 ff f0 33 0a 41 01 00 11 ff .3.A.....3.A.... backtrace (crc 24eb7b3a): [<00000000e211b653>] kmem_cache_alloc_node_noprof+0x269/0x2e0 [<000000009d0985fa>] perf_event_alloc+0x5f/0xcf0 [<00000000084ad4a2>] perf_event_create_kernel_counter+0x38/0x1b0 [<00000000fde96401>] hardlockup_detector_event_create+0x50/0xe0 [<0000000051183158>] watchdog_hardlockup_enable+0x17/0x70 [<00000000ac89727f>] softlockup_start_fn+0x15/0x40 ...
Our stress test includes CPU online and offline cycles, and updating the watchdog configuration.
After reading the code, I found that there may be a race between cleaning up perf_event after updating watchdog and disabling event when the CPU goes offline:
CPU0 CPU1 CPU2 (update watchdog) (hotplug offline CPU1)
... _cpu_down(CPU1) cpus_read_lock() // waiting for cpu lock softlockup_start_all smp_call_on_cpu(CPU1) softlockup_start_fn ... watchdog_hardlockup_enable(CPU1) perf create E1 watchdog_ev[CPU1] = E1 cpus_read_unlock() cpus_write_lock() cpuhp_kick_ap_work(CPU1) cpuhp_thread_fun ... watchdog_hardlockup_disable(CPU1) watchdog_ev[CPU1] = NULL dead_event[CPU1] = E1 __lockup_detector_cleanup for each dead_events_mask release each dead_event /* * CPU1 has not been added to * dead_events_mask, then E1 * will not be released */ CPU1 -> dead_events_mask cpumask_clear(&dead_events_mask) // dead_events_mask is cleared, E1 is leaked
In this case, the leaked perf_event E1 matches the perf_event leak reported by kmemleak. Due to the low probability of problem recurrence (only reported once), I added some hack delays in the code:
static void __lockup_detector_reconfigure(void) { ... watchdog_hardlockup_start(); cpus_read_unlock(); + mdelay(100); /* * Must be called outside the cpus locked section to prevent * recursive locking in the perf code. ... }
void watchdog_hardlockup_disable(unsigned int cpu) { ... perf_event_disable(event); this_cpu_write(watchdog_ev, NULL); this_cpu_write(dead_event, event); + mdelay(100); cpumask_set_cpu(smp_processor_id(), &dead_events_mask); atomic_dec(&watchdog_cpus); ... }
void hardlockup_detector_perf_cleanup(void) { ... perf_event_release_kernel(event); per_cpu(dead_event, cpu) = NULL; } + mdelay(100); cpumask_clear(&dead_events_mask); }
Then, simultaneously performing CPU on/off and switching watchdog, it is almost certain to reproduce this leak.
The problem here is that releasing perf_event is not within the CPU hotplug read-write lock. Commit:
941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock")
introduced deferred release to solve the deadlock caused by calling get_online_cpus() when releasing perf_event. Later, commit:
efe951d3de91 ("perf/x86: Fix perf,x86,cpuhp deadlock")
removed the get_online_cpus() call on the perf_event release path to solve another deadlock problem.
Therefore, it is now possible to move the release of perf_event back into the CPU hotplug read-write lock, and release the event immediately after disabling it.
Fixes: 941154bd6937 ("watchdog/hardlockup/perf: Prevent CPU hotplug deadlock") Signed-off-by: Li Huafei <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d2254b06 |
| 05-Feb-2025 |
Nam Cao <[email protected]> |
watchdog: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initializatio
watchdog: Switch to use hrtimer_setup()
hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely.
Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism.
Patch was created by using Coccinelle.
Signed-off-by: Nam Cao <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/all/a5c62f2b5e1ea1cf4d32f37bc2d21a8eeab2f875.1738746821.git.namcao@linutronix.de
show more ...
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
3c16fc0c |
| 10-Dec-2024 |
Yunhui Cui <[email protected]> |
watchdog: output this_cpu when printing hard LOCKUP
When printing "Watchdog detected hard LOCKUP on cpu", also output the detecting CPU. It's more intuitive.
Link: https://lkml.kernel.org/r/202412
watchdog: output this_cpu when printing hard LOCKUP
When printing "Watchdog detected hard LOCKUP on cpu", also output the detecting CPU. It's more intuitive.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yunhui Cui <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Bitao Hu <[email protected]> Cc: Joel Granados <[email protected]> Cc: John Ogness <[email protected]> Cc: Liu Song <[email protected]> Cc: Song Liu <[email protected]> Cc: Thomas Weißschuh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
e32c2601 |
| 05-Nov-2024 |
Tejun Heo <[email protected]> |
sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly behaving BPF scheduler can live-lock the system by maki
sched_ext: Enable the ops breather and eject BPF scheduler on softlockup
On 2 x Intel Sapphire Rapids machines with 224 logical CPUs, a poorly behaving BPF scheduler can live-lock the system by making multiple CPUs bang on the same DSQ to the point where soft-lockup detection triggers before SCX's own watchdog can take action. It also seems possible that the machine can be live-locked enough to prevent scx_ops_helper, which is an RT task, from running in a timely manner.
Implement scx_softlockup() which is called when three quarters of soft-lockup threshold has passed. The function immediately enables the ops breather and triggers an ops error to initiate ejection of the BPF scheduler.
The previous and this patch combined enable the kernel to reliably recover the system from live-lock conditions that can be triggered by a poorly behaving BPF scheduler on Intel dual socket systems.
Signed-off-by: Tejun Heo <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7 |
|
| #
83801018 |
| 06-Sep-2024 |
Tio Zhang <[email protected]> |
kernel/watchdog: always restore watchdog_softlockup(,hardlockup)_user_enabled after proc show
Otherwise when watchdog_enabled becomes 0, watchdog_softlockup(,hardlockup)_user_enabled will changes to
kernel/watchdog: always restore watchdog_softlockup(,hardlockup)_user_enabled after proc show
Otherwise when watchdog_enabled becomes 0, watchdog_softlockup(,hardlockup)_user_enabled will changes to 0 after proc show.
Steps to reproduce:
step 1: # cat /proc/sys/kernel/*watchdog 1 1 1
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 1 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_user_enabled | 1 |----------------------------------|--------------------------
step 2: # echo 0 > /proc/sys/kernel/watchdog
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 0 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_user_enabled | 0 |----------------------------------|--------------------------
step 3: # cat /proc/sys/kernel/*watchdog 0 0 0
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 0 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 0 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 0 |----------------------------------|-------------------------- | watchdog_user_enabled | 0 |----------------------------------|--------------------------
step 4: # echo 1 > /proc/sys/kernel/watchdog
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 0 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 0 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 0 |----------------------------------|-------------------------- | watchdog_user_enabled | 0 |----------------------------------|--------------------------
step 5: # cat /proc/sys/kernel/*watchdog 0 0 0
If we dont do "step 3", do "step 4" right after "step 2", it will be
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 1 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_user_enabled | 1 |----------------------------------|--------------------------
then everything works correctly.
So this patch fix "step 3"'s value into
| name | value |----------------------------------|-------------------------- | watchdog_enabled | 0 |----------------------------------|-------------------------- | watchdog_hardlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_softlockup_user_enabled | 1 |----------------------------------|-------------------------- | watchdog_user_enabled | 0 |----------------------------------|--------------------------
And still print 0 as before.
Link: https://lkml.kernel.org/r/20240906094700.GA30052@didi-ThinkCentre-M930t-N000 Signed-off-by: Tio Zhang <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Ben Segall <[email protected]> Cc: Daniel Bristot de Oliveira <[email protected]> Cc: Dietmar Eggemann <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: John Ogness <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Krister Johansen <[email protected]> Cc: Li Zhe <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vincent Guittot <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2 |
|
| #
97cf8f5f |
| 02-Aug-2024 |
Waiman Long <[email protected]> |
watchdog: handle the ENODEV failure case of lockup_detector_delay_init() separately
When watchdog_hardlockup_probe() is being called by lockup_detector_delay_init(), an error return of -ENODEV will
watchdog: handle the ENODEV failure case of lockup_detector_delay_init() separately
When watchdog_hardlockup_probe() is being called by lockup_detector_delay_init(), an error return of -ENODEV will happen for the arm64 arch when arch_perf_nmi_is_available() returns false. This means that NMI is not usable by the hard lockup detector and so has to be disabled. This can be considered a deficiency in that particular arm64 chip, but there is nothing we can do about it. That also means the following error will always be reported when the kernel boot up.
watchdog: Delayed init of the lockup detector failed: -19
The word "failed" itself has a connotation that there is something wrong with the kernel which is not really the case here. Handle this special ENODEV case separately and explain the reason behind disabling hard lockup detector without causing anxiety for those users who read the above message and wonder about it.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Joel Granados <[email protected]> Cc: Li Zhe <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Thomas Weißschuh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.11-rc1 |
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7 |
|
| #
393fb313 |
| 30-Apr-2024 |
Song Liu <[email protected]> |
watchdog: allow nmi watchdog to use raw perf event
NMI watchdog permanently consumes one hardware counters per CPU on the system. For systems that use many hardware counters, this causes more aggre
watchdog: allow nmi watchdog to use raw perf event
NMI watchdog permanently consumes one hardware counters per CPU on the system. For systems that use many hardware counters, this causes more aggressive time multiplexing of perf events.
OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely used. Add kernel cmdline arg nmi_watchdog=rNNN to configure the watchdog to use raw event. For example, on Intel CPUs, we can use "r300" to configure the watchdog to use ref-cycles event.
If the raw event does not work, fall back to use "cycles".
[[email protected]: fix kerneldoc] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Matthew Wilcox (Oracle)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
602ba773 |
| 30-Apr-2024 |
Song Liu <[email protected]> |
watchdog: handle comma separated nmi_watchdog command line
Per the document, the kernel can accept comma separated command line like nmi_watchdog=nopanic,0. However, the code doesn't really handle
watchdog: handle comma separated nmi_watchdog command line
Per the document, the kernel can accept comma separated command line like nmi_watchdog=nopanic,0. However, the code doesn't really handle it. Fix the kernel to handle it properly.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Song Liu <[email protected]> Cc: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
11a92190 |
| 27-Jun-2023 |
Joel Granados <[email protected]> |
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove the sentinel from ctl_table arrays. Reduce by one the values used to compare the size of the adjusted arrays.
Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
e9a9292e |
| 11-Apr-2024 |
Bitao Hu <[email protected]> |
watchdog/softlockup: Report the most frequent interrupts
When the watchdog determines that the current soft lockup is due to an interrupt storm based on CPU utilization, reporting the most frequent
watchdog/softlockup: Report the most frequent interrupts
When the watchdog determines that the current soft lockup is due to an interrupt storm based on CPU utilization, reporting the most frequent interrupts could be good enough for further troubleshooting.
Below is an example of interrupt storm. The call tree does not provide useful information, but analyzing which interrupt caused the soft lockup by comparing the counts of interrupts during the lockup period allows to identify the culprit.
[ 638.870231] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [swapper/9:0] [ 638.870825] CPU#9 Utilization every 4s during lockup: [ 638.871194] #1: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.871652] #2: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.872107] #3: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.872563] #4: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.873018] #5: 0% system, 0% softirq, 100% hardirq, 0% idle [ 638.873494] CPU#9 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs: [ 638.873994] #1: 330945 irq#7 [ 638.874236] #2: 31 irq#82 [ 638.874493] #3: 10 irq#10 [ 638.874744] #4: 2 irq#89 [ 638.874992] #5: 1 irq#102 ... [ 638.875313] Call trace: [ 638.875315] __do_softirq+0xa8/0x364
Signed-off-by: Bitao Hu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Liu Song <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
d7037381 |
| 11-Apr-2024 |
Bitao Hu <[email protected]> |
watchdog/softlockup: Low-overhead detection of interrupt storm
The following softlockup is caused by interrupt storm, but it cannot be identified from the call tree. Because the call tree is just a
watchdog/softlockup: Low-overhead detection of interrupt storm
The following softlockup is caused by interrupt storm, but it cannot be identified from the call tree. Because the call tree is just a snapshot and doesn't fully capture the behavior of the CPU during the soft lockup. watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] ... Call trace: __do_softirq+0xa0/0x37c __irq_exit_rcu+0x108/0x140 irq_exit+0x14/0x20 __handle_domain_irq+0x84/0xe0 gic_handle_irq+0x80/0x108 el0_irq_naked+0x50/0x58
Therefore, it is necessary to report CPU utilization during the softlockup_threshold period (report once every sample_period, for a total of 5 reportings), like this: watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921] CPU#28 Utilization every 4s during lockup: #1: 0% system, 0% softirq, 100% hardirq, 0% idle #2: 0% system, 0% softirq, 100% hardirq, 0% idle #3: 0% system, 0% softirq, 100% hardirq, 0% idle #4: 0% system, 0% softirq, 100% hardirq, 0% idle #5: 0% system, 0% softirq, 100% hardirq, 0% idle ...
This is helpful in determining whether an interrupt storm has occurred or in identifying the cause of the softlockup. The criteria for determination are as follows:
a. If the hardirq utilization is high, then interrupt storm should be considered and the root cause cannot be determined from the call tree. b. If the softirq utilization is high, then the call might not necessarily point at the root cause. c. If the system utilization is high, then analyzing the root cause from the call tree is possible in most cases.
The mechanism requires a considerable amount of global storage space when configured for the maximum number of CPUs. Therefore, adding a SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes" if the max number of CPUs is <= 128.
Signed-off-by: Bitao Hu <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Reviewed-by: Liu Song <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
75060b6e |
| 06-Mar-2024 |
Thomas Weißschuh <[email protected]> |
watchdog/core: remove sysctl handlers from public header
The functions are only used in the file where they are defined. Remove them from the header and make them static.
Also guard proc_soft_watc
watchdog/core: remove sysctl handlers from public header
The functions are only used in the file where they are defined. Remove them from the header and make them static.
Also guard proc_soft_watchdog with a #define-guard as it is not used otherwise.
Link: https://lkml.kernel.org/r/20240306-const-sysctl-prep-watchdog-v1-1-bd45da3a41cf@weissschuh.net Signed-off-by: Thomas Weißschuh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
55efe4ab |
| 20-Dec-2023 |
Douglas Anderson <[email protected]> |
watchdog: if panicking and we dumped everything, don't re-enable dumping
If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenab
watchdog: if panicking and we dumped everything, don't re-enable dumping
If, as part of handling a hardlockup or softlockup, we've already dumped all CPUs and we're just about to panic, don't reenable dumping and give some other CPU a chance to hop in there and add some confusing logs right as the panic is happening.
Link: https://lkml.kernel.org/r/20231220131534.4.Id3a9c7ec2d7d83e4080da6f8662ba2226b40543f@changeid Signed-off-by: Douglas Anderson <[email protected]> Cc: John Ogness <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Li Zhe <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
ee6bdb3f |
| 20-Dec-2023 |
Douglas Anderson <[email protected]> |
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read.
watchdog/hardlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
If two CPUs end up reporting a hardlockup at the same time then their logs could get interleaved which is hard to read.
The interleaving problem was especially bad with the "perf" hardlockup detector where the locked up CPU is always the same as the running CPU and we end up in show_regs(). show_regs() has no inherent serialization so we could mix together two crawls if two hardlockups happened at the same time (and if we didn't have `sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll fully serialize hardlockups when using the "perf" hardlockup detector.
The interleaving problem was less bad with the "buddy" hardlockup detector. With "buddy" we always end up calling `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running one. trigger_single_cpu_backtrace() always at least serializes the individual stack crawls because it eventually uses printk_cpu_sync_get_irqsave(). Unfortunately the fact that trigger_single_cpu_backtrace() eventually calls printk_cpu_sync_get_irqsave() (on a different CPU) means that we have to drop the "lock" before calling it and we can't fully serialize all printouts associated with a given hardlockup. However, we still do get the advantage of serializing the output of print_modules() and print_irqtrace_events().
Aside from serializing hardlockups from each other, this change also has the advantage of serializing hardlockups and softlockups from each other if they happen to happen at the same time since they are both using the same "lock".
Even though nobody is expected to hang while holding the lock associated with printk_cpu_sync_get_irqsave(), out of an abundance of caution, we don't call printk_cpu_sync_get_irqsave() until after we print out about the hardlockup. This makes extra sure that, even if printk_cpu_sync_get_irqsave() somehow never runs we at least print that we saw the hardlockup. This is different than the choice made for softlockup because hardlockup is really our last resort.
Link: https://lkml.kernel.org/r/20231220131534.3.I6ff691b3b40f0379bc860f80c6e729a0485b5247@changeid Signed-off-by: Douglas Anderson <[email protected]> Reviewed-by: John Ogness <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Li Zhe <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
896260a6 |
| 20-Dec-2023 |
Douglas Anderson <[email protected]> |
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize s
watchdog/softlockup: use printk_cpu_sync_get_irqsave() to serialize reporting
Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave() and printk_cpu_sync_put_irqrestore() to serialize softlockup reporting. Alone this doesn't have any real advantage over the spinlock, but this will allow us to use the same function in a future change to also serialize hardlockup crawls.
NOTE: for the most part this serialization is important because we often end up in the show_regs() path and that has no built-in serialization if there are multiple callers at once. However, even in the case where we end up in the dump_stack() path this still has some advantages because the stack will be guaranteed to be together in the logs with the lockup message with no interleaving.
NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be called multiple times on the same CPU is important here. Specifically we hold the "lock" while calling dump_stack() which also gets the same "lock". This is explicitly documented to be OK and means we don't need to introduce a variant of dump_stack() that doesn't grab the lock.
Link: https://lkml.kernel.org/r/20231220131534.2.Ia5906525d440d8e8383cde31b7c61c2aadc8f907@changeid Signed-off-by: Douglas Anderson <[email protected]> Reviewed-by: John Ogness <[email protected]> Reviewed-by: Li Zhe <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
6dcde5d5 |
| 20-Dec-2023 |
Douglas Anderson <[email protected]> |
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
Patch series "watchdog: Better handling of concurrent lockups".
When we get multiple lockups at roughly the same time, the output i
watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
Patch series "watchdog: Better handling of concurrent lockups".
When we get multiple lockups at roughly the same time, the output in the kernel logs can be very confusing since the reports about the lockups end up interleaved in the logs. There is some code in the kernel to try to handle this but it wasn't that complete.
Li Zhe recently made this a bit better for softlockups (specifically for the case where `kernel.softlockup_all_cpu_backtrace` is not set) in commit 9d02330abd3e ("softlockup: serialized softlockup's log"), but that only handled softlockup reports. Hardlockup reports still had similar issues.
This series also has a small fix to avoid dumping all stacks a second time in the case of a panic. This is a bit unrelated to the interleaving fixes but it does also improve the clarity of lockup reports.
This patch (of 4):
The hardlockup detector and softlockup detector both have the ability to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have some logic to attempt to avoid interleaving printouts if two CPUs were trying to do dumps of all CPUs at the same time. However:
- The hardlockup detector's logic still allowed interleaving some information. Specifically another CPU could print modules and dump the stack of the locked CPU at the same time we were dumping all CPUs.
- In the case where `kernel.hardlockup_panic` was set in addition to `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected hardlockups at the same time the second CPU could call panic() while the first was still dumping stacks. This was especially bad if the locked up CPU wasn't responding to the request for a backtrace since the function nmi_trigger_cpumask_backtrace() can wait up to 10 seconds.
Let's resolve this by adopting the softlockup logic in the hardlockup handler.
NOTES:
- As part of this, one might think that we should make a helper function that both the hard and softlockup detectors call. This turns out not to be super trivial since it would have to be parameterized quite a bit since there are separate global variables controlling each lockup detector and they print log messages that are just different enough that it would be a pain. We probably don't want to change the messages that are printed without good reason to avoid throwing log parsers for a loop.
- One might also think that it would be a good idea to have the hardlockup and softlockup detector use the same global variable to prevent interleaving. This would make sure that softlockups and hardlockups can't interleave each other. That _almost_ works but has a dangerous flaw if `kernel.hardlockup_panic` is not the same as `kernel.softlockup_panic` because we might skip a call to panic() if one type of lockup was detected at the same time as another.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid Signed-off-by: Douglas Anderson <[email protected]> Cc: John Ogness <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Li Zhe <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
9d02330a |
| 23-Nov-2023 |
Li Zhe <[email protected]> |
softlockup: serialized softlockup's log
If multiple CPUs trigger softlockup at the same time with 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear staggeredly in dmesg, which will
softlockup: serialized softlockup's log
If multiple CPUs trigger softlockup at the same time with 'softlockup_all_cpu_backtrace=0', the softlockup's logs will appear staggeredly in dmesg, which will affect the viewing of the logs for developer. Since the code path for outputting softlockup logs is not a kernel hotspot and the performance requirements for the code are not strict, locks are used to serialize the softlockup log output to improve the readability of the logs.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Li Zhe <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Zefan Li <[email protected]> Cc: John Ogness <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
8b793bcd |
| 27-Oct-2023 |
Krister Johansen <[email protected]> |
watchdog: move softlockup_panic back to early_param
Setting softlockup_panic from do_sysctl_args() causes it to take effect later in boot. The lockup detector is enabled before SMP is brought onlin
watchdog: move softlockup_panic back to early_param
Setting softlockup_panic from do_sysctl_args() causes it to take effect later in boot. The lockup detector is enabled before SMP is brought online, but do_sysctl_args runs afterwards. If a user wants to set softlockup_panic on boot and have it trigger should a softlockup occur during onlining of the non-boot processors, they could do this prior to commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases"). However, after this commit the value of softlockup_panic is set too late to be of help for this type of problem. Restore the prior behavior.
Signed-off-by: Krister Johansen <[email protected]> Cc: [email protected] Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases") Signed-off-by: Luis Chamberlain <[email protected]>
show more ...
|
| #
1f38c86b |
| 04-Aug-2023 |
Douglas Anderson <[email protected]> |
watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a
watchdog/hardlockup: avoid large stack frames in watchdog_hardlockup_check()
After commit 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") we started storing a `struct cpumask` on the stack in watchdog_hardlockup_check(). On systems with CONFIG_NR_CPUS set to 8192 this takes up 1K on the stack. That triggers warnings with `CONFIG_FRAME_WARN` set to 1024.
We'll use the new trigger_allbutcpu_cpu_backtrace() to avoid needing to use a CPU mask at all.
Link: https://lkml.kernel.org/r/20230804065935.v4.2.I501ab68cb926ee33a7c87e063d207abf09b9943c@changeid Fixes: 77c12fc95980 ("watchdog/hardlockup: add a "cpu" param to watchdog_hardlockup_check()") Signed-off-by: Douglas Anderson <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected] Acked-by: Michal Hocko <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
8d539b84 |
| 04-Aug-2023 |
Douglas Anderson <[email protected]> |
nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a pl
nmi_backtrace: allow excluding an arbitrary CPU
The APIs that allow backtracing across CPUs have always had a way to exclude the current CPU. This convenience means callers didn't need to find a place to allocate a CPU mask just to handle the common case.
Let's extend the API to take a CPU ID to exclude instead of just a boolean. This isn't any more complex for the API to handle and allows the hardlockup detector to exclude a different CPU (the one it already did a trace for) without needing to find space for a CPU mask.
Arguably, this new API also encourages safer behavior. Specifically if the caller wants to avoid tracing the current CPU (maybe because they already traced the current CPU) this makes it more obvious to the caller that they need to make sure that the current CPU ID can't change.
[[email protected]: fix trigger_allbutcpu_cpu_backtrace() stub] Link: https://lkml.kernel.org/r/20230804065935.v4.1.Ia35521b91fc781368945161d7b28538f9996c182@changeid Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: kernel test robot <[email protected]> Cc: Lecopzer Chen <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Pingfan Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4, v6.4-rc7 |
|
| #
47f4cb43 |
| 16-Jun-2023 |
Petr Mladek <[email protected]> |
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be
watchdog/sparc64: define HARDLOCKUP_DETECTOR_SPARC64
The HAVE_ prefix means that the code could be enabled. Add another variable for HAVE_HARDLOCKUP_DETECTOR_SPARC64 without this prefix. It will be set when it should be built. It will make it compatible with the other hardlockup detectors.
Before, it is far from obvious that the SPARC64 variant is actually used:
$> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y
After, it is more clear:
$> make ARCH=sparc64 defconfig $> grep HARDLOCKUP_DETECTOR .config CONFIG_HAVE_HARDLOCKUP_DETECTOR_BUDDY=y CONFIG_HAVE_HARDLOCKUP_DETECTOR_SPARC64=y CONFIG_HARDLOCKUP_DETECTOR_SPARC64=y
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
a5fcc236 |
| 16-Jun-2023 |
Petr Mladek <[email protected]> |
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one.
C
watchdog/hardlockup: make HAVE_NMI_WATCHDOG sparc64-specific
There are several hardlockup detector implementations and several Kconfig values which allow selection and build of the preferred one.
CONFIG_HARDLOCKUP_DETECTOR was introduced by the commit 23637d477c1f53acb ("lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR") in v2.6.36. It was a preparation step for introducing the new generic perf hardlockup detector.
The existing arch-specific variants did not support the to-be-created generic build configurations, sysctl interface, etc. This distinction was made explicit by the commit 4a7863cc2eb5f98 ("x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR") in v2.6.38.
CONFIG_HAVE_NMI_WATCHDOG was introduced by the commit d314d74c695f967e105 ("nmi watchdog: do not use cpp symbol in Kconfig") in v3.4-rc1. It replaced the above mentioned ARCH_HAS_NMI_WATCHDOG. At that time, it was still used by three architectures, namely blackfin, mn10300, and sparc.
The support for blackfin and mn10300 architectures has been completely dropped some time ago. And sparc is the only architecture with the historic NMI watchdog at the moment.
And the old sparc implementation is really special. It is always built on sparc64. It used to be always enabled until the commit 7a5c8b57cec93196b ("sparc: implement watchdog_nmi_enable and watchdog_nmi_disable") added in v4.10-rc1.
There are only few locations where the sparc64 NMI watchdog interacts with the generic hardlockup detectors code:
+ implements arch_touch_nmi_watchdog() which is called from the generic touch_nmi_watchdog()
+ implements watchdog_hardlockup_enable()/disable() to support /proc/sys/kernel/nmi_watchdog
+ is always preferred over other generic watchdogs, see CONFIG_HARDLOCKUP_DETECTOR
+ includes asm/nmi.h into linux/nmi.h because some sparc-specific functions are needed in sparc-specific code which includes only linux/nmi.h.
The situation became more complicated after the commit 05a4a95279311c3 ("kernel/watchdog: split up config options") and commit 2104180a53698df5 ("powerpc/64s: implement arch-specific hardlockup watchdog") in v4.13-rc1. They introduced HAVE_HARDLOCKUP_DETECTOR_ARCH. It was used for powerpc specific hardlockup detector. It was compatible with the perf one regarding the general boot, sysctl, and programming interfaces.
HAVE_HARDLOCKUP_DETECTOR_ARCH was defined as a superset of HAVE_NMI_WATCHDOG. It made some sense because all arch-specific detectors had some common requirements, namely:
+ implemented arch_touch_nmi_watchdog() + included asm/nmi.h into linux/nmi.h + defined the default value for /proc/sys/kernel/nmi_watchdog
But it actually has made things pretty complicated when the generic buddy hardlockup detector was added. Before the generic perf detector was newer supported together with an arch-specific one. But the buddy detector could work on any SMP system. It means that an architecture could support both the arch-specific and buddy detector.
As a result, there are few tricky dependencies. For example, CONFIG_HARDLOCKUP_DETECTOR depends on:
((HAVE_HARDLOCKUP_DETECTOR_PERF || HAVE_HARDLOCKUP_DETECTOR_BUDDY) && !HAVE_NMI_WATCHDOG) || HAVE_HARDLOCKUP_DETECTOR_ARCH
The problem is that the very special sparc implementation is defined as:
HAVE_NMI_WATCHDOG && !HAVE_HARDLOCKUP_DETECTOR_ARCH
Another problem is that the meaning of HAVE_NMI_WATCHDOG is far from clear without reading understanding the history.
Make the logic less tricky and more self-explanatory by making HAVE_NMI_WATCHDOG specific for the sparc64 implementation. And rename it to HAVE_HARDLOCKUP_DETECTOR_SPARC64.
Note that HARDLOCKUP_DETECTOR_PREFER_BUDDY, HARDLOCKUP_DETECTOR_PERF, and HARDLOCKUP_DETECTOR_BUDDY may conflict only with HAVE_HARDLOCKUP_DETECTOR_ARCH. They depend on HARDLOCKUP_DETECTOR and it is not longer enabled when HAVE_NMI_WATCHDOG is set.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Mladek <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc6, v6.4-rc5, v6.4-rc4 |
|
| #
28168eca |
| 27-May-2023 |
Douglas Anderson <[email protected]> |
watchdog/hardlockup: move SMP barriers from common code to buddy code
It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hard
watchdog/hardlockup: move SMP barriers from common code to buddy code
It's been suggested that since the SMP barriers are only potentially useful for the buddy hardlockup detector, not the perf hardlockup detector, that the barriers belong in the buddy code. Let's move them and add clearer comments about why they're needed.
Link: https://lkml.kernel.org/r/20230526184139.9.I5ab0a0eeb0bd52fb23f901d298c72fa5c396e22b@changeid Signed-off-by: Douglas Anderson <[email protected]> Suggested-by: Petr Mladek <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|