|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6 |
|
| #
caeb8ba5 |
| 07-Mar-2025 |
Yan Zhao <[email protected]> |
kexec_core: accept unaccepted kexec segments' destination addresses
The UEFI Specification version 2.9 introduces the concept of memory acceptance: some Virtual Machine platforms, such as Intel TDX
kexec_core: accept unaccepted kexec segments' destination addresses
The UEFI Specification version 2.9 introduces the concept of memory acceptance: some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, require memory to be accepted before it can be used by the guest.
Accepting memory is expensive. The memory must be allocated by the VMM and then brought to a known safe state: cache must be flushed, memory must be zeroed with the guest's encryption key, and associated metadata must be manipulated. These operations must be performed from a trusted environment (firmware or TDX module). Switching context to and from it also takes time.
This cost adds up. On large confidential VMs, memory acceptance alone can take minutes. It is better to delay memory acceptance until the memory is actually needed.
The kernel accepts memory when it is allocated from buddy allocator for the first time. This reduces boot time and decreases memory overhead as the VMM can allocate memory as needed.
It does not work when the guest attempts to kexec into a new kernel.
The kexec segments' destination addresses are not allocated by the buddy allocator. Instead, they are searched from normal system RAM (top-down or bottom-up) and exclude driver-managed memory, ACPI, persistent, and reserved memory. Unaccepted memory is normal system RAM from kernel point of view and kexec can place segments there.
Kexec bypasses the code path in buddy allocator where memory gets accepted and it leads to a crash when kexec accesses segments' memory.
Accept the destination addresses during the kexec load, immediately after they pass sanity checks. This ensures the code is located in a common place shared by both the kexec_load and kexec_file_load system calls.
This will not conflict with the accounting in try_to_accept_memory_one() since the accounting is set during kernel boot and decremented when pages are moved to the freelists. There is no harm in invoking accept_memory() on a page before making it available to the buddy allocator.
No need to worry about re-accepting memory since accept_memory() checks the unaccepted bitmap before accepting a memory page.
Although a user may perform kexec loading without ever triggering the jump, it doesn't impact much since kexec loading is not in a performance-critical path. Additionally, the destination addresses are always searched and found in the same location on a given system.
Changes to the destination address searching logic to locate only memory in either unaccepted or accepted status are unnecessary and complicated.
[[email protected]: update the commit message] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yan Zhao <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Dave Hansen <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Ashish Kalra <[email protected]> Cc: Baoquan He <[email protected]> Cc: Jianxiong Gao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc5 |
|
| #
63830aef |
| 26-Feb-2025 |
Marcos Paulo de Souza <[email protected]> |
printk: Rename resume_console to console_resume_all
The function resume_console has a misleading name, since it resumes all consoles, so rename it accordingly.
Signed-off-by: Marcos Paulo de Souza
printk: Rename resume_console to console_resume_all
The function resume_console has a misleading name, since it resumes all consoles, so rename it accordingly.
Signed-off-by: Marcos Paulo de Souza <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: John Ogness <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: Fixed typo in the commit message.] Signed-off-by: Petr Mladek <[email protected]>
show more ...
|
| #
e9cec448 |
| 26-Feb-2025 |
Marcos Paulo de Souza <[email protected]> |
printk: Rename suspend_console to console_suspend_all
The function suspend_console has a misleading name, since it suspends all consoles, so rename it accordingly.
Signed-off-by: Marcos Paulo de So
printk: Rename suspend_console to console_suspend_all
The function suspend_console has a misleading name, since it suspends all consoles, so rename it accordingly.
Signed-off-by: Marcos Paulo de Souza <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Reviewed-by: John Ogness <[email protected]> Link: https://lore.kernel.org/r/[email protected] [[email protected]: Fixed typo in the commit message.] Signed-off-by: Petr Mladek <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc4, v6.14-rc3, v6.14-rc2, v6.14-rc1 |
|
| #
1751f872 |
| 28-Jan-2025 |
Joel Granados <[email protected]> |
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysc
treewide: const qualify ctl_tables where applicable
Add the const qualifier to all the ctl_tables in the tree except for watchdog_hardlockup_sysctl, memory_allocation_profiling_sysctls, loadpin_sysctl_table and the ones calling register_net_sysctl (./net, drivers/inifiniband dirs). These are special cases as they use a registration function with a non-const qualified ctl_table argument or modify the arrays before passing them on to the registration function.
Constifying ctl_table structs will prevent the modification of proc_handler function pointers as the arrays would reside in .rodata. This is made possible after commit 78eb4ea25cd5 ("sysctl: treewide: constify the ctl_table argument of proc_handlers") constified all the proc_handlers.
Created this by running an spatch followed by a sed command: Spatch: virtual patch
@ depends on !(file in "net") disable optional_qualifier @
identifier table_name != { watchdog_hardlockup_sysctl, iwcm_ctl_table, ucma_ctl_table, memory_allocation_profiling_sysctls, loadpin_sysctl_table }; @@
+ const struct ctl_table table_name [] = { ... };
sed: sed --in-place \ -e "s/struct ctl_table .table = &uts_kern/const struct ctl_table *table = \&uts_kern/" \ kernel/utsname_sysctl.c
Reviewed-by: Song Liu <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> # for kernel/trace/ Reviewed-by: Martin K. Petersen <[email protected]> # SCSI Reviewed-by: Darrick J. Wong <[email protected]> # xfs Acked-by: Jani Nikula <[email protected]> Acked-by: Corey Minyard <[email protected]> Acked-by: Wei Liu <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Bill O'Donnell <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Ashutosh Dixit <[email protected]> Acked-by: Anna Schumaker <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.13, v6.13-rc7 |
|
| #
dc6ffa6c |
| 09-Jan-2025 |
Rafael J. Wysocki <[email protected]> |
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
The KEXEC_JUMP flow is analogous to hibernation flows occurring before and after creating an image and before and after jumping from
kexec_core: Add and update comments regarding the KEXEC_JUMP flow
The KEXEC_JUMP flow is analogous to hibernation flows occurring before and after creating an image and before and after jumping from the restore kernel to the image one, which is why it uses the same device callbacks as those hibernation flows.
Add comments explaining that to the code in question and update an existing comment in it which appears a bit out of context.
No functional changes.
Signed-off-by: Rafael J. Wysocki <[email protected]> Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
78eb4ea2 |
| 24-Jul-2024 |
Joel Granados <[email protected]> |
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ct
sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified.
This patch has been generated by the following coccinelle script:
``` virtual patch
@r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos);
@r2@ identifier func, ctl, write, buffer, lenp, ppos; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... }
@r3@ identifier func; @@
int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *);
@r4@ identifier func, ctl; @@
int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *);
@r5@ identifier func, write, buffer, lenp, ppos; @@
int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos);
```
* Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted.
* The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration.
Co-developed-by: Thomas Weißschuh <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Co-developed-by: Joel Granados <[email protected]> Signed-off-by: Joel Granados <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2, v6.8-rc1, v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6, v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1 |
|
| #
11a92190 |
| 27-Jun-2023 |
Joel Granados <[email protected]> |
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (
kernel misc: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%[email protected]/)
Remove the sentinel from ctl_table arrays. Reduce by one the values used to compare the size of the adjusted arrays.
Signed-off-by: Joel Granados <[email protected]>
show more ...
|
| #
4bb7be96 |
| 22-Feb-2024 |
yang.zhang <[email protected]> |
kexec: copy only happens before uchunk goes to zero
When loading segments, ubytes is <= mbytes. When ubytes is exhausted, there could be remaining mbytes. Then in the while loop, the buf pointer a
kexec: copy only happens before uchunk goes to zero
When loading segments, ubytes is <= mbytes. When ubytes is exhausted, there could be remaining mbytes. Then in the while loop, the buf pointer advancing with mchunk will causing meaningless reading even though it doesn't harm.
So let's change to make sure that all of the copying and the rest only happens before uchunk goes to zero.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: yang.zhang <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
02aff848 |
| 24-Jan-2024 |
Baoquan He <[email protected]> |
crash: split crash dumping code out from kexec_core.c
Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes need be built in to avoid compiling error when building kexec code eve
crash: split crash dumping code out from kexec_core.c
Currently, KEXEC_CORE select CRASH_CORE automatically because crash codes need be built in to avoid compiling error when building kexec code even though the crash dumping functionality is not enabled. E.g -------------------- CONFIG_CRASH_CORE=y CONFIG_KEXEC_CORE=y CONFIG_KEXEC=y CONFIG_KEXEC_FILE=y ---------------------
After splitting out crashkernel reservation code and vmcoreinfo exporting code, there's only crash related code left in kernel/crash_core.c. Now move crash related codes from kexec_core.c to crash_core.c and only build it in when CONFIG_CRASH_DUMP=y.
And also wrap up crash codes inside CONFIG_CRASH_DUMP ifdeffery scope, or replace inappropriate CONFIG_KEXEC_CORE ifdef with CONFIG_CRASH_DUMP ifdef in generic kernel files.
With these changes, crash_core codes are abstracted from kexec codes and can be disabled at all if only kexec reboot feature is wanted.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Al Viro <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Hari Bathini <[email protected]> Cc: Pingfan Liu <[email protected]> Cc: Klara Modin <[email protected]> Cc: Michael Kelley <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Yang Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
7bb94380 |
| 13-Dec-2023 |
James Gowans <[email protected]> |
kexec: do syscore_shutdown() in kernel_kexec
syscore_shutdown() runs driver and module callbacks to get the system into a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM / re
kexec: do syscore_shutdown() in kernel_kexec
syscore_shutdown() runs driver and module callbacks to get the system into a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") syscore_shutdown() was removed from kernel_restart_prepare() and hence got (incorrectly?) removed from the kexec flow. This was innocuous until commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") changed the way that KVM registered its shutdown callbacks, switching from reboot notifiers to syscore_ops.shutdown. As syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run and virtualisation is left enabled on the boot CPU which results in triple faults when switching to the new kernel on Intel x86 VT-x with VMXE enabled.
Fix this by adding syscore_shutdown() to the kexec sequence. In terms of where to add it, it is being added after migrating the kexec task to the boot CPU, but before APs are shut down. It is not totally clear if this is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") it is stated that "syscore_ops operations should be carried with one CPU on-line and interrupts disabled." APs are only offlined later in machine_shutdown(), so this syscore_shutdown() is being run while APs are still online. This seems to be the correct place as it matches where syscore_shutdown() is run in the reboot and halt flows - they also run it before APs are shut down. The assumption is that the commit message in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is no longer valid.
KVM has been discussed here as it is what broke loudly by not having syscore_shutdown() in kexec, but this change impacts more than just KVM; all drivers/modules which register a syscore_ops.shutdown callback will now be invoked in the kexec flow. Looking at some of them like x86 MCE it is probably more correct to also shut these down during kexec. Maintainers of all drivers which use syscore_ops.shutdown are added on CC for visibility. They are:
arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
This has been tested by doing a kexec on x86_64 and aarch64.
Link: https://lkml.kernel.org/r/[email protected] Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") Signed-off-by: James Gowans <[email protected]> Cc: Baoquan He <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Tony Luck <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Chen-Yu Tsai <[email protected]> Cc: Jernej Skrabec <[email protected]> Cc: Samuel Holland <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Sebastian Reichel <[email protected]> Cc: Orson Zhai <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Jan H. Schoenherr <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
2861b377 |
| 21-Dec-2023 |
Yuntao Wang <[email protected]> |
kexec_core: fix the assignment to kimage->control_page
image->control_page represents the starting address for allocating the next control page, while hole_end represents the address of the last val
kexec_core: fix the assignment to kimage->control_page
image->control_page represents the starting address for allocating the next control page, while hole_end represents the address of the last valid byte of the currently allocated control page.
This bug actually does not affect the correctness of allocating control pages, because image->control_page is currently only used in kimage_alloc_crash_control_pages(), and this function, when allocating control pages, will first align image->control_page up to the nearest `(1 << order) << PAGE_SHIFT` boundary, then use this value as the starting address of the next control page. This ensures that the newly allocated control page will use the correct starting address and not overlap with previously allocated control pages.
Although it does not affect the correctness of the final result, it is better for us to set image->control_page to the correct value, in case it might be used elsewhere in the future, potentially causing errors.
Therefore, after successfully allocating a control page, image->control_page should be updated to `hole_end + 1`, rather than hole_end.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Cc: Baoquan He <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
816d334a |
| 17-Dec-2023 |
Yuntao Wang <[email protected]> |
kexec: modify the meaning of the end parameter in kimage_is_destination_range()
The end parameter received by kimage_is_destination_range() should be the last valid byte address of the target memory
kexec: modify the meaning of the end parameter in kimage_is_destination_range()
The end parameter received by kimage_is_destination_range() should be the last valid byte address of the target memory segment plus 1. However, in the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions, the corresponding value passed to kimage_is_destination_range() is the last valid byte address of the target memory segment, which is 1 less.
There are two ways to fix this bug. We can either correct the logic of the locate_mem_hole_bottom_up() and locate_mem_hole_top_down() functions, or we can fix kimage_is_destination_range() by making the end parameter represent the last valid byte address of the target memory segment. Here, we choose the second approach.
Due to the modification to kimage_is_destination_range(), we also need to adjust its callers, such as kimage_alloc_normal_control_pages() and kimage_alloc_page().
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
db6b6fb7 |
| 12-Dec-2023 |
Yuntao Wang <[email protected]> |
kexec: use ALIGN macro instead of open-coding it
Use ALIGN macro instead of open-coding it to improve code readability.
Link: https://lkml.kernel.org/r/[email protected] Sign
kexec: use ALIGN macro instead of open-coding it
Use ALIGN macro instead of open-coding it to improve code readability.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Yuntao Wang <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
cbc2fe9d |
| 13-Dec-2023 |
Baoquan He <[email protected]> |
kexec_file: add kexec_file flag to control debug printing
Patch series "kexec_file: print out debugging message if required", v4.
Currently, specifying '-d' on kexec command will print a lot of deb
kexec_file: add kexec_file flag to control debug printing
Patch series "kexec_file: print out debugging message if required", v4.
Currently, specifying '-d' on kexec command will print a lot of debugging informationabout kexec/kdump loading with kexec_load interface.
However, kexec_file_load prints nothing even though '-d' is specified. It's very inconvenient to debug or analyze the kexec/kdump loading when something wrong happened with kexec/kdump itself or develper want to check the kexec/kdump loading.
In this patchset, a kexec_file flag is KEXEC_FILE_DEBUG added and checked in code. If it's passed in, debugging message of kexec_file code will be printed out and can be seen from console and dmesg. Otherwise, the debugging message is printed like beofre when pr_debug() is taken.
Note: **** ===== 1) The code in kexec-tools utility also need be changed to support passing KEXEC_FILE_DEBUG to kernel when 'kexec -s -d' is specified. The patch link is here: ========= [PATCH] kexec_file: add kexec_file flag to support debug printing http://lists.infradead.org/pipermail/kexec/2023-November/028505.html
2) s390 also has kexec_file code, while I am not sure what debugging information is necessary. So leave it to s390 developer.
Test: **** ==== Testing was done in v1 on x86_64 and arm64. For v4, tested on x86_64 again. And on x86_64, the printed messages look like below: -------------------------------------------------------------- kexec measurement buffer for the loaded kernel at 0x207fffe000. Loaded purgatory at 0x207fff9000 Loaded boot_param, command line and misc at 0x207fff3000 bufsz=0x1180 memsz=0x1180 Loaded 64bit kernel at 0x207c000000 bufsz=0xc88200 memsz=0x3c4a000 Loaded initrd at 0x2079e79000 bufsz=0x2186280 memsz=0x2186280 Final command line is: root=/dev/mapper/fedora_intel--knightslanding--lb--02-root ro rd.lvm.lv=fedora_intel-knightslanding-lb-02/root console=ttyS0,115200N81 crashkernel=256M E820 memmap: 0000000000000000-000000000009a3ff (1) 000000000009a400-000000000009ffff (2) 00000000000e0000-00000000000fffff (2) 0000000000100000-000000006ff83fff (1) 000000006ff84000-000000007ac50fff (2) ...... 000000207fff6150-000000207fff615f (128) 000000207fff6160-000000207fff714f (1) 000000207fff7150-000000207fff715f (128) 000000207fff7160-000000207fff814f (1) 000000207fff8150-000000207fff815f (128) 000000207fff8160-000000207fffffff (1) nr_segments = 5 segment[0]: buf=0x000000004e5ece74 bufsz=0x211 mem=0x207fffe000 memsz=0x1000 segment[1]: buf=0x000000009e871498 bufsz=0x4000 mem=0x207fff9000 memsz=0x5000 segment[2]: buf=0x00000000d879f1fe bufsz=0x1180 mem=0x207fff3000 memsz=0x2000 segment[3]: buf=0x000000001101cd86 bufsz=0xc88200 mem=0x207c000000 memsz=0x3c4a000 segment[4]: buf=0x00000000c6e38ac7 bufsz=0x2186280 mem=0x2079e79000 memsz=0x2187000 kexec_file_load: type:0, start:0x207fff91a0 head:0x109e004002 flags:0x8 ---------------------------------------------------------------------------
This patch (of 7):
When specifying 'kexec -c -d', kexec_load interface will print loading information, e.g the regions where kernel/initrd/purgatory/cmdline are put, the memmap passed to 2nd kernel taken as system RAM ranges, and printing all contents of struct kexec_segment, etc. These are very helpful for analyzing or positioning what's happening when kexec/kdump itself failed. The debugging printing for kexec_load interface is made in user space utility kexec-tools.
Whereas, with kexec_file_load interface, 'kexec -s -d' print nothing. Because kexec_file code is mostly implemented in kernel space, and the debugging printing functionality is missed. It's not convenient when debugging kexec/kdump loading and jumping with kexec_file_load interface.
Now add KEXEC_FILE_DEBUG to kexec_file flag to control the debugging message printing. And add global variable kexec_file_dbg_print and macro kexec_dprintk() to facilitate the printing.
This is a preparation, later kexec_dprintk() will be used to replace the existing pr_debug(). Once 'kexec -s -d' is specified, it will print out kexec/kdump loading information. If '-d' is not specified, it regresses to pr_debug().
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Cc: Conor Dooley <[email protected]> Cc: Joe Perches <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
0311d827 |
| 14-Nov-2023 |
Uros Bizjak <[email protected]> |
kexec: use atomic_try_cmpxchg in crash_kexec
Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in crash_kexec(). x86 CMPXCHG instruction returns success in ZF flag, so this change s
kexec: use atomic_try_cmpxchg in crash_kexec
Use atomic_try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in crash_kexec(). x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg.
No functional change intended.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Uros Bizjak <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
b631b95d |
| 14-Sep-2023 |
Baoquan He <[email protected]> |
crash_core: move crashk_*res definition into crash_core.c
Both crashk_res and crashk_low_res are used to mark the reserved crashkernel regions in iomem_resource tree. And later the generic crashker
crash_core: move crashk_*res definition into crash_core.c
Both crashk_res and crashk_low_res are used to mark the reserved crashkernel regions in iomem_resource tree. And later the generic crashkernel resrvation will be added into crash_core.c. So move crashk_res and crashk_low_res definition into crash_core.c to avoid compiling error if CONFIG_CRASH_CORE=on while CONFIG_KEXEC_CORE is unset.
Meanwhile include <asm/crash_core.h> in <linux/crash_core.h> if generic reservation is needed. In that case, <asm/crash_core.h> need be added by ARCH. In asm/crash_core.h, ARCH can provide its own macro definitions to override macros in <linux/crash_core.h> if needed. Wrap the including into CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION ifdeffery scope to avoid compiling error in other ARCH-es which don't take the generic reservation way yet.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Baoquan He <[email protected]> Reviewed-by: Zhen Lei <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chen Jiahao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
24726275 |
| 14-Aug-2023 |
Eric DeVolder <[email protected]> |
crash: add generic infrastructure for crash hotplug support
To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlini
crash: add generic infrastructure for crash hotplug support
To support crash hotplug, a mechanism is needed to update the crash elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/ onlining). The crash elfcorehdr describes the CPUs and memory to be written into the vmcore.
To track CPU changes, callbacks are registered with the cpuhp mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The crash hotplug elfcorehdr update has no explicit ordering requirement (relative to other cpuhp states), so meets the criteria for utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic state and avoids the need to introduce a new state for crash hotplug. Also, CPUHP_BP_PREPARE_DYN is the last state in the PREPARE group, just prior to the STARTING group, which is very close to the CPU starting up in a plug/online situation, or stopping in a unplug/ offline situation. This minimizes the window of time during an actual plug/online or unplug/offline situation in which the elfcorehdr would be inaccurate. Note that for a CPU being unplugged or offlined, the CPU will still be present in the list of CPUs generated by crash_prepare_elf64_headers(). However, there is no need to explicitly omit the CPU, see justification in 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()'.
To track memory changes, a notifier is registered to capture the memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
The CPU callbacks and memory notifiers invoke crash_handle_hotplug_event() which performs needed tasks and then dispatches the event to the architecture specific arch_crash_handle_hotplug_event() to update the elfcorehdr with the current state of CPUs and memory. During the process, the kexec_lock is held.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric DeVolder <[email protected]> Reviewed-by: Sourabh Jain <[email protected]> Acked-by: Hari Bathini <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Akhil Raj <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Mimi Zohar <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
6f991cc3 |
| 14-Aug-2023 |
Eric DeVolder <[email protected]> |
crash: move a few code bits to setup support of crash hotplug
Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
Once the kdump service is loaded, if changes to CPUs or memor
crash: move a few code bits to setup support of crash hotplug
Patch series "crash: Kernel handling of CPU and memory hot un/plug", v28.
Once the kdump service is loaded, if changes to CPUs or memory occur, either by hot un/plug or off/onlining, the crash elfcorehdr must also be updated.
The elfcorehdr describes to kdump the CPUs and memory in the system, and any inaccuracies can result in a vmcore with missing CPU context or memory regions.
The current solution utilizes udev to initiate an unload-then-reload of the kdump image (eg. kernel, initrd, boot_params, purgatory and elfcorehdr) by the userspace kexec utility. In the original post I outlined the significant performance problems related to offloading this activity to userspace.
This patchset introduces a generic crash handler that registers with the CPU and memory notifiers. Upon CPU or memory changes, from either hot un/plug or off/onlining, this generic handler is invoked and performs important housekeeping, for example obtaining the appropriate lock, and then invokes an architecture specific handler to do the appropriate elfcorehdr update.
Note the description in patch 'crash: change crash_prepare_elf64_headers() to for_each_possible_cpu()' and 'x86/crash: optimize CPU changes' that enables further optimizations related to CPU plug/unplug/online/offline performance of elfcorehdr updates.
In the case of x86_64, the arch specific handler generates a new elfcorehdr, and overwrites the old one in memory; thus no involvement with userspace needed.
To realize the benefits/test this patchset, one must make a couple of minor changes to userspace:
- Prevent udev from updating kdump crash kernel on hot un/plug changes. Add the following as the first lines to the RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
# The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end"
With this changeset applied, the two rules evaluate to false for CPU and memory change events and thus skip the userspace unload-then-reload of kdump.
- Change to the kexec_file_load for loading the kdump kernel: Eg. on RHEL: in /usr/bin/kdumpctl, change to: standard_kexec_args="-p -d -s" which adds the -s to select kexec_file_load() syscall.
This kernel patchset also supports kexec_load() with a modified kexec userspace utility. A working changeset to the kexec userspace utility is posted to the kexec-tools mailing list here:
http://lists.infradead.org/pipermail/kexec/2023-May/027049.html
To use the kexec-tools patch, apply, build and install kexec-tools, then change the kdumpctl's standard_kexec_args to replace the -s with --hotplug. The removal of -s reverts to the kexec_load syscall and the addition of --hotplug invokes the changes put forth in the kexec-tools patch.
This patch (of 8):
The crash hotplug support leans on the work for the kexec_file_load() syscall. To also support the kexec_load() syscall, a few bits of code need to be move outside of CONFIG_KEXEC_FILE. As such, these bits are moved out of kexec_file.c and into a common location crash_core.c.
In addition, struct crash_mem and crash_notes were moved to new locales so that PROC_KCORE, which sets CRASH_CORE alone, builds correctly.
No functionality change intended.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Eric DeVolder <[email protected]> Reviewed-by: Sourabh Jain <[email protected]> Acked-by: Hari Bathini <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Akhil Raj <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Borislav Petkov (AMD) <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Young <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Mimi Zohar <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: Valentin Schneider <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4 |
|
| #
16c6006a |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: enable kexec_crash_size to support two crash kernel regions
The crashk_low_res should be considered by /sys/kernel/kexec_crash_size to support two crash kernel regions shrinking if existing.
kexec: enable kexec_crash_size to support two crash kernel regions
The crashk_low_res should be considered by /sys/kernel/kexec_crash_size to support two crash kernel regions shrinking if existing.
While doing it, crashk_low_res will only be shrunk when the entire crashk_res is empty; and if the crashk_res is empty and crahk_low_res is not, change crashk_low_res to be crashk_res.
[[email protected]: redo changelog] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
5b7bfb32 |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: add helper __crash_shrink_memory()
No functional change, in preparation for the next patch so that it is easier to review.
[[email protected]: make __crash_shrink_memory() static]
kexec: add helper __crash_shrink_memory()
No functional change, in preparation for the next patch so that it is easier to review.
[[email protected]: make __crash_shrink_memory() static] Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
8a7db779 |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: improve the readability of crash_shrink_memory()
The major adjustments are: 1. end = start + new_size. The 'end' here is not an accurate representation, because it is not the new end of
kexec: improve the readability of crash_shrink_memory()
The major adjustments are: 1. end = start + new_size. The 'end' here is not an accurate representation, because it is not the new end of crashk_res, but the start of ram_res, difference 1. So eliminate it and replace it with ram_res->start. 2. Use 'ram_res->start' and 'ram_res->end' as arguments to crash_free_reserved_phys_range() to indicate that the memory covered by 'ram_res' is released from the crashk. And keep it close to insert_resource(). 3. Replace 'if (start == end)' with 'if (!new_size)', clear indication that all crashk memory will be shrunken.
No functional change.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
f7f567b9 |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: clear crashk_res if all its memory has been released
If the resource of crashk_res has been released, it is better to clear crashk_res.start and crashk_res.end. Because 'end = start - 1' is
kexec: clear crashk_res if all its memory has been released
If the resource of crashk_res has been released, it is better to clear crashk_res.start and crashk_res.end. Because 'end = start - 1' is not reasonable, and in some places the test is based on crashk_res.end, not resource_size(&crashk_res).
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
6f22a744 |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: delete a useless check in crash_shrink_memory()
The check '(crashk_res.parent != NULL)' is added by commit e05bd3367bd3 ("kexec: fix Oops in crash_shrink_memory()"), but it's stale now. Beca
kexec: delete a useless check in crash_shrink_memory()
The check '(crashk_res.parent != NULL)' is added by commit e05bd3367bd3 ("kexec: fix Oops in crash_shrink_memory()"), but it's stale now. Because if 'crashk_res' is not reserved, it will be zero in size and will be intercepted by the above 'if (new_size >= old_size)'.
Ago: if (new_size >= end - start + 1)
Now: old_size = (end == 0) ? 0 : end - start + 1; if (new_size >= old_size)
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Cc: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
1cba6c43 |
| 27-May-2023 |
Zhen Lei <[email protected]> |
kexec: fix a memory leak in crash_shrink_memory()
Patch series "kexec: enable kexec_crash_size to support two crash kernel regions".
When crashkernel=X fails to reserve region under 4G, it will fal
kexec: fix a memory leak in crash_shrink_memory()
Patch series "kexec: enable kexec_crash_size to support two crash kernel regions".
When crashkernel=X fails to reserve region under 4G, it will fall back to reserve region above 4G and a region of the default size will also be reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only supports one crash kernel region now, the user cannot sense the low memory reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot be freed by writing this file.
For example: resource_size(crashk_res) = 512M resource_size(crashk_low_res) = 256M
The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be 768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB, which is incorrect.
Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them.
This patch (of 6):
If the value of parameter 'new_size' is in the semi-open and semi-closed interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the calculation result of ram_res is:
ram_res->start = crashk_res.end + 1 ram_res->end = crashk_res.end
The operation of insert_resource() fails, and ram_res is not added to iomem_resource. As a result, the memory of the control block ram_res is leaked.
In fact, on all architectures, the start address and size of crashk_res are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need to round up crashk_res.start again. Instead, we should round up 'new_size' in advance.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6480e5a09237 ("kdump: add missing RAM resource in crash_shrink_memory()") Fixes: 06a7f711246b ("kexec: premit reduction of the reserved memory size") Signed-off-by: Zhen Lei <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Cong Wang <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Michael Holzheu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4, v6.2-rc3 |
|
| #
a42aaad2 |
| 04-Jan-2023 |
Ricardo Ribalda <[email protected]> |
kexec: introduce sysctl parameters kexec_load_limit_*
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system
kexec: introduce sysctl parameters kexec_load_limit_*
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system.
Linux already provides a way to disable loading new kexec kernel via kexec_load_disabled, but that control is very coard, it is all or nothing and does not make distinction between a panic kexec and a normal kexec.
This patch introduces new sysctl parameters, with finer tuning to specify how many times a kexec kernel can be loaded. The sysadmin can set different limits for kexec panic and kexec reboot kernels. The value can be modified at runtime via sysctl, but only with a stricter value.
With these new parameters on place, a system with loadpin and verity enabled, using the following kernel parameters: sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a good warranty that if initrd tries to load a panic kernel, a malitious user will have small chances to replace that kernel with a different one, even if they can trigger timeouts on the disk where the panic kernel lives.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Guilherme G. Piccoli <[email protected]> # Steam Deck Cc: Joel Fernandes (Google) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|