|
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4 |
|
| #
39ec9eaa |
| 19-Feb-2025 |
Kees Cook <[email protected]> |
coredump: Only sort VMAs when core_sort_vma sysctl is set
The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort ba
coredump: Only sort VMAs when core_sort_vma sysctl is set
The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort based on the setting of the new sysctl, core_sort_vma, which defaults to 0, no sorting.
Reported-by: Michael Stapelberg <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ [1] Fixes: 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores") Signed-off-by: Kees Cook <[email protected]>
show more ...
|
|
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13 |
|
| #
e129fdc5 |
| 14-Jan-2025 |
Phil Auld <[email protected]> |
Documentation/sysctl: Add timer_migration to kernel.rst
There is no mention of timer_migration in the docs. Add a short description.
Signed-off-by: Phil Auld <[email protected]> Cc: Jonathan Corbet
Documentation/sysctl: Add timer_migration to kernel.rst
There is no mention of timer_migration in the docs. Add a short description.
Signed-off-by: Phil Auld <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Signed-off-by: Jonathan Corbet <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5 |
|
| #
62bf7065 |
| 27-Oct-2024 |
Lance Yang <[email protected]> |
hung_task: add docs for hung_task_detect_count
This commit introduces documentation for hung_task_detect_count in kernel.rst.
Link: https://lkml.kernel.org/r/20241027120747.42833-3-ioworker0@gmail.
hung_task: add docs for hung_task_detect_count
This commit introduces documentation for hung_task_detect_count in kernel.rst.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mingzhe Yang <[email protected]> Signed-off-by: Lance Yang <[email protected]> Cc: Bang Li <[email protected]> Cc: Baolin Wang <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Huang Cun <[email protected]> Cc: Joel Granados <[email protected]> Cc: Joel Granados <[email protected]> Cc: John Siddle <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Ryan Roberts <[email protected]> Cc: Thomas Weißschuh <[email protected]> Cc: Yongliang Gao <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1 |
|
| #
cbade823 |
| 21-Jul-2024 |
Helge Deller <[email protected]> |
parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap procfs
parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN
Allow users to disable kernel warnings for unaligned memory accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap procfs entry. That way users can disable those warnings in case they happen too often.
Signed-off-by: Helge Deller <[email protected]>
show more ...
|
|
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6 |
|
| #
19f0423f |
| 23-Feb-2024 |
Huang Yiwei <[email protected]> |
tracing: Support to dump instance traces by ftrace_dump_on_oops
Currently ftrace only dumps the global trace buffer on an OOPs. For debugging a production usecase, instance trace will be helpful to
tracing: Support to dump instance traces by ftrace_dump_on_oops
Currently ftrace only dumps the global trace buffer on an OOPs. For debugging a production usecase, instance trace will be helpful to check specific problems since global trace buffer may be used for other purposes.
This patch extend the ftrace_dump_on_oops parameter to dump a specific or multiple trace instances:
- ftrace_dump_on_oops=0: as before -- don't dump - ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer on all CPUs - ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global trace buffer on CPU that triggered the oops - ftrace_dump_on_oops=<instance_name>: new behavior -- dump the tracing instance matching <instance_name> - ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu], <instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace buffer and multiple instance buffer on all CPUs, or only dump on CPU that triggered the oops if =2 or =orig_cpu is given
Also, the sysctl node can handle the input accordingly.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Ross Zwisler <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Huang Yiwei <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3 |
|
| #
2e3fc6ca |
| 02-Feb-2024 |
Feng Tang <[email protected]> |
panic: add option to dump blocked tasks in panic_print
For debugging kernel panics and other bugs, there is already an option of panic_print to dump all tasks' call stacks. On today's large servers
panic: add option to dump blocked tasks in panic_print
For debugging kernel panics and other bugs, there is already an option of panic_print to dump all tasks' call stacks. On today's large servers running many containers, there could be thousands of tasks or more, and this will print out huge amount of call stacks, taking a lot of time (for serial console which is main target user case of panic_print).
And in many cases, only those several tasks being blocked are key for the panic, so add an option to only dump blocked tasks' call stacks.
[[email protected]: clarify documentation a little] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Feng Tang <[email protected]> Tested-by: Guilherme G. Piccoli <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.8-rc2, v6.8-rc1 |
|
| #
9220066e |
| 15-Jan-2024 |
Alexey Gladkov <[email protected]> |
docs: add information about ipc sysctls limitations
After 25b21cb2f6d6 ("[PATCH] IPC namespace core") and 4e9823111bdc ("[PATCH] IPC namespace - shm") the shared memory page count stopped being glob
docs: add information about ipc sysctls limitations
After 25b21cb2f6d6 ("[PATCH] IPC namespace core") and 4e9823111bdc ("[PATCH] IPC namespace - shm") the shared memory page count stopped being global and started counting per ipc namespace. The documentation and shmget(2) still says that shmall is a global option.
shmget(2):
SHMALL System-wide limit on the total amount of shared memory, measured in units of the system page size. On Linux, this limit can be read and modified via /proc/sys/kernel/shmall.
I think the changes made in 2006 should be documented.
Link: https://lkml.kernel.org/r/09e99911071766958af488beb4e8a728a4f12135.1705333426.git.legion@kernel.org Signed-off-by: Alexey Gladkov <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Link: https://lkml.kernel.org/r/ede20ddf7be48b93e8084c3be2e920841ee1a641.1663756794.git.legion@kernel.org Cc: Christian Brauner <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Joel Granados <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6 |
|
| #
8f833c82 |
| 09-Oct-2023 |
Shrikanth Hegde <[email protected]> |
sched/topology: Change behaviour of the 'sched_energy_aware' sysctl, based on the platform
The 'sched_energy_aware' sysctl is available for the admin to disable/enable energy aware scheduling(EAS).
sched/topology: Change behaviour of the 'sched_energy_aware' sysctl, based on the platform
The 'sched_energy_aware' sysctl is available for the admin to disable/enable energy aware scheduling(EAS). EAS is enabled only if few conditions are met by the platform. They are, asymmetric CPU capacity, no SMT, schedutil CPUfreq governor, frequency invariant load tracking etc. A platform may boot without EAS capability, but could gain such capability at runtime. For example, changing/registering the cpufreq governor to schedutil.
At present, though platform doesn't support EAS, this sysctl returns 1 and it ends up calling build_perf_domains on write to 1 and NOP when writing to 0. That is confusing and un-necessary.
Desired behavior would be to have this sysctl to enable/disable the EAS on supported platform. On non-supported platform write to the sysctl would return not supported error and read of the sysctl would return empty. So sched_energy_aware returns empty - EAS is not possible at this moment This will include EAS capable platforms which have at least one EAS condition false during startup, e.g. not using the schedutil cpufreq governor sched_energy_aware returns 0 - EAS is supported but disabled by admin. sched_energy_aware returns 1 - EAS is supported and enabled.
User can find out the reason why EAS is not possible by checking info messages. sched_is_eas_possible returns true if the platform can do EAS at this moment.
Signed-off-by: Shrikanth Hegde <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Pierre Gondois <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4 |
|
| #
94483490 |
| 13-Jan-2023 |
Ard Biesheuvel <[email protected]> |
Documentation: Drop or replace remaining mentions of IA64
Drop or update mentions of IA64, as appropriate.
Signed-off-by: Ard Biesheuvel <[email protected]>
|
| #
76d3ccec |
| 21-Aug-2023 |
Matteo Rizzo <[email protected]> |
io_uring: add a sysctl to disable io_uring system-wide
Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or 2. When 0 (the default), all processes are allowed to create io_uring i
io_uring: add a sysctl to disable io_uring system-wide
Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or 2. When 0 (the default), all processes are allowed to create io_uring instances, which is the current behavior. When 1, io_uring creation is disabled (io_uring_setup() will fail with -EPERM) for unprivileged processes not in the kernel.io_uring_group group. When 2, calls to io_uring_setup() fail with -EPERM regardless of privilege.
Signed-off-by: Matteo Rizzo <[email protected]> [JEM: modified to add io_uring_group] Signed-off-by: Jeff Moyer <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
show more ...
|
| #
57972127 |
| 02-Aug-2023 |
Alexandre Ghiti <[email protected]> |
Documentation: admin-guide: Add riscv sysctl_perf_user_access
riscv now uses this sysctl so document its usage for this architecture.
Signed-off-by: Alexandre Ghiti <[email protected]>
|
| #
e4624435 |
| 12-Jun-2023 |
Jonathan Corbet <[email protected]> |
docs: arm64: Move arm64 documentation under Documentation/arch/
Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation direct
docs: arm64: Move arm64 documentation under Documentation/arch/
Architecture-specific documentation is being moved into Documentation/arch/ as a way of cleaning up the top-level documentation directory and making the docs hierarchy more closely match the source hierarchy. Move Documentation/arm64 into arch/ (along with the Chinese equvalent translations) and fix up documentation references.
Cc: Will Deacon <[email protected]> Cc: Alex Shi <[email protected]> Cc: Hu Haowen <[email protected]> Cc: Paolo Bonzini <[email protected]> Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Yantengsi <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
show more ...
|
| #
ff61f079 |
| 14-Mar-2023 |
Jonathan Corbet <[email protected]> |
docs: move x86 documentation into Documentation/arch/
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more c
docs: move x86 documentation into Documentation/arch/
Move the x86 documentation under Documentation/arch/ as a way of cleaning up the top-level directory and making the structure of our docs more closely match the structure of the source directories it describes.
All in-kernel references to the old paths have been updated.
Acked-by: Dave Hansen <[email protected]> Cc: [email protected] Cc: [email protected] Cc: Borislav Petkov <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Jonathan Corbet <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc3 |
|
| #
a42aaad2 |
| 04-Jan-2023 |
Ricardo Ribalda <[email protected]> |
kexec: introduce sysctl parameters kexec_load_limit_*
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system
kexec: introduce sysctl parameters kexec_load_limit_*
kexec allows replacing the current kernel with a different one. This is usually a source of concerns for sysadmins that want to harden a system.
Linux already provides a way to disable loading new kexec kernel via kexec_load_disabled, but that control is very coard, it is all or nothing and does not make distinction between a panic kexec and a normal kexec.
This patch introduces new sysctl parameters, with finer tuning to specify how many times a kexec kernel can be loaded. The sysadmin can set different limits for kexec panic and kexec reboot kernels. The value can be modified at runtime via sysctl, but only with a stricter value.
With these new parameters on place, a system with loadpin and verity enabled, using the following kernel parameters: sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a good warranty that if initrd tries to load a panic kernel, a malitious user will have small chances to replace that kernel with a different one, even if they can trigger timeouts on the disk where the panic kernel lives.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Guilherme G. Piccoli <[email protected]> # Steam Deck Cc: Joel Fernandes (Google) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
06dcb013 |
| 04-Jan-2023 |
Ricardo Ribalda <[email protected]> |
Documentation: sysctl: correct kexec_load_disabled
Patch series "kexec: Add new parameter to limit the access to kexec", v6.
Add two parameter to specify how many times a kexec kernel can be loaded
Documentation: sysctl: correct kexec_load_disabled
Patch series "kexec: Add new parameter to limit the access to kexec", v6.
Add two parameter to specify how many times a kexec kernel can be loaded.
These parameter allow hardening the system.
While we are at it, fix a documentation issue and refactor some code.
This patch (of 3):
kexec_load_disabled affects both ``kexec_load`` and ``kexec_file_load`` syscalls. Make it explicit.
Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ricardo Ribalda <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]> Acked-by: Baoquan He <[email protected]> Cc: Bagas Sanjaya <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Guilherme G. Piccoli <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Philipp Rudo <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.2-rc2, v6.2-rc1, v6.1 |
|
| #
61a6fccc |
| 10-Dec-2022 |
Huacai Chen <[email protected]> |
LoongArch: Add unaligned access support
Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support unaligned access in hardware, while Loongson-3 series (Loongson-3A5000, Loongson-3C5000) are
LoongArch: Add unaligned access support
Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support unaligned access in hardware, while Loongson-3 series (Loongson-3A5000, Loongson-3C5000) are configurable whether support unaligned access in hardware. This patch add unaligned access emulation for those LoongArch processors without hardware support.
Signed-off-by: Huacai Chen <[email protected]>
show more ...
|
|
Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6 |
|
| #
9fc9e278 |
| 17-Nov-2022 |
Kees Cook <[email protected]> |
panic: Introduce warn_limit
Like oops_limit, add warn_limit for limiting the number of warnings when panic_on_warn is not set.
Cc: Jonathan Corbet <[email protected]> Cc: Andrew Morton <akpm@linux-fou
panic: Introduce warn_limit
Like oops_limit, add warn_limit for limiting the number of warnings when panic_on_warn is not set.
Cc: Jonathan Corbet <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Huang Ying <[email protected]> Cc: Petr Mladek <[email protected]> Cc: tangmeng <[email protected]> Cc: "Guilherme G. Piccoli" <[email protected]> Cc: Tiezhu Yang <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
| #
de92f657 |
| 02-Dec-2022 |
Kees Cook <[email protected]> |
exit: Allow oops_limit to be disabled
In preparation for keeping oops_limit logic in sync with warn_limit, have oops_limit == 0 disable checking the Oops counter.
Cc: Jann Horn <[email protected]> C
exit: Allow oops_limit to be disabled
In preparation for keeping oops_limit logic in sync with warn_limit, have oops_limit == 0 disable checking the Oops counter.
Cc: Jann Horn <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Baolin Wang <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Huang Ying <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]>
show more ...
|
| #
d4ccd54d |
| 17-Nov-2022 |
Jann Horn <[email protected]> |
exit: Put an upper limit on how often we can oops
Many Linux systems are configured to not panic on oops; but allowing an attacker to oops the system **really** often can make even bugs that look co
exit: Put an upper limit on how often we can oops
Many Linux systems are configured to not panic on oops; but allowing an attacker to oops the system **really** often can make even bugs that look completely unexploitable exploitable (like NULL dereferences and such) if each crash elevates a refcount by one or a lock is taken in read mode, and this causes a counter to eventually overflow.
The most interesting counters for this are 32 bits wide (like open-coded refcounts that don't use refcount_t). (The ldsem reader count on 32-bit platforms is just 16 bits, but probably nobody cares about 32-bit platforms that much nowadays.)
So let's panic the system if the kernel is constantly oopsing.
The speed of oopsing 2^32 times probably depends on several factors, like how long the stack trace is and which unwinder you're using; an empirically important one is whether your console is showing a graphical environment or a text console that oopses will be printed to. In a quick single-threaded benchmark, it looks like oopsing in a vfork() child with a very short stack trace only takes ~510 microseconds per run when a graphical console is active; but switching to a text console that oopses are printed to slows it down around 87x, to ~45 milliseconds per run. (Adding more threads makes this faster, but the actual oops printing happens under &die_lock on x86, so you can maybe speed this up by a factor of around 2 and then any further improvement gets eaten up by lock contention.)
It looks like it would take around 8-12 days to overflow a 32-bit counter with repeated oopsing on a multi-core X86 system running a graphical environment; both me (in an X86 VM) and Seth (with a distro kernel on normal hardware in a standard configuration) got numbers in that ballpark.
12 days aren't *that* short on a desktop system, and you'd likely need much longer on a typical server system (assuming that people don't run graphical desktop environments on their servers), and this is a *very* noisy and violent approach to exploiting the kernel; and it also seems to take orders of magnitude longer on some machines, probably because stuff like EFI pstore will slow it down a ton if that's active.
Signed-off-by: Jann Horn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|
|
Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4 |
|
| #
8603b6f5 |
| 03-Sep-2022 |
Oleksandr Natalenko <[email protected]> |
core_pattern: add CPU specifier
Statistically, in a large deployment regular segfaults may indicate a CPU issue.
Currently, it is not possible to find out what CPU the segfault happened on. There
core_pattern: add CPU specifier
Statistically, in a large deployment regular segfaults may indicate a CPU issue.
Currently, it is not possible to find out what CPU the segfault happened on. There are at least two attempts to improve segfault logging with this regard, but they do not help in case the logs rotate.
Hence, lets make sure it is possible to permanently record a CPU the task ran on using a new core_pattern specifier.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oleksandr Natalenko <[email protected]> Suggested-by: Renaud Métrich <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Grzegorz Halat <[email protected]> Cc: "Guilherme G. Piccoli" <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Joel Savitz <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Kees Cook <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Rob Herring <[email protected]> Cc: Stephen Kitt <[email protected]> Cc: Will Deacon <[email protected]> Cc: Xiaoming Ni <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
72720937 |
| 24-Oct-2022 |
Guilherme G. Piccoli <[email protected]> |
x86/split_lock: Add sysctl to control the misery mode
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") changed the way the split lock detector works when in "warn" mode;
x86/split_lock: Add sysctl to control the misery mode
Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") changed the way the split lock detector works when in "warn" mode; basically, it not only shows the warn message, but also intentionally introduces a slowdown through sleeping plus serialization mechanism on such task. Based on discussions in [0], seems the warning alone wasn't enough motivation for userspace developers to fix their applications.
This slowdown is enough to totally break some proprietary (aka. unfixable) userspace[1].
Happens that originally the proposal in [0] was to add a new mode which would warns + slowdown the "split locking" task, keeping the old warn mode untouched. In the end, that idea was discarded and the regular/default "warn" mode now slows down the applications. This is quite aggressive with regards proprietary/legacy programs that basically are unable to properly run in kernel with this change. While it is understandable that a malicious application could DoS by split locking, it seems unacceptable to regress old/proprietary userspace programs through a default configuration that previously worked. An example of such breakage was reported in [1].
Add a sysctl to allow controlling the "misery mode" behavior, as per Thomas suggestion on [2]. This way, users running legacy and/or proprietary software are allowed to still execute them with a decent performance while still observing the warning messages on kernel log.
[0] https://lore.kernel.org/lkml/[email protected]/ [1] https://github.com/doitsujin/dxvk/issues/2938 [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/
[ dhansen: minor changelog tweaks, including clarifying the actual problem ]
Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Guilherme G. Piccoli <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Tony Luck <[email protected]> Tested-by: Andre Almeida <[email protected]> Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com
show more ...
|
| #
aadc0cd5 |
| 30-Sep-2022 |
Stephen Kitt <[email protected]> |
docs: sysctl/fs: re-order, prettify
This brings the text markup in line with sysctl/abi and sysctl/kernel:
* the entries are ordered alphabetically * the table of contents is automatically generate
docs: sysctl/fs: re-order, prettify
This brings the text markup in line with sysctl/abi and sysctl/kernel:
* the entries are ordered alphabetically * the table of contents is automatically generated * markup is used as appropriate for constants etc.
The content isn't fully up-to-date but the obsolete entries are gone, so remove the kernel version mention.
Signed-off-by: Stephen Kitt <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
show more ...
|
| #
bfca3dd3 |
| 01-Sep-2022 |
Petr Vorel <[email protected]> |
kernel/utsname_sysctl.c: print kernel arch
Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.
This helps people who debug kernel with initramfs with minimal environment (i.e.
kernel/utsname_sysctl.c: print kernel arch
Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.
This helps people who debug kernel with initramfs with minimal environment (i.e. without coreutils or even busybox) or allow to open sysfs file instead of run 'uname -m' in high level languages.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Petr Vorel <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Cc: David Sterba <[email protected]> Cc: "Eric W . Biederman" <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
|
Revision tags: v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7 |
|
| #
c6833e10 |
| 13-Jul-2022 |
Huang Ying <[email protected]> |
memory tiering: rate limit NUMA migration throughput
In NUMA balancing memory tiering mode, if there are hot pages in slow memory node and cold pages in fast memory node, we need to promote/demote h
memory tiering: rate limit NUMA migration throughput
In NUMA balancing memory tiering mode, if there are hot pages in slow memory node and cold pages in fast memory node, we need to promote/demote hot/cold pages between the fast and cold memory nodes.
A choice is to promote/demote as fast as possible. But the CPU cycles and memory bandwidth consumed by the high promoting/demoting throughput will hurt the latency of some workload because of accessing inflating and slow memory bandwidth contention.
A way to resolve this issue is to restrict the max promoting/demoting throughput. It will take longer to finish the promoting/demoting. But the workload latency will be better. This is implemented in this patch as the page promotion rate limit mechanism.
The number of the candidate pages to be promoted to the fast memory node via NUMA balancing is counted, if the count exceeds the limit specified by the users, the NUMA balancing promotion will be stopped until the next second.
A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added for the users to specify the limit.
Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Baolin Wang <[email protected]> Tested-by: Baolin Wang <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: osalvador <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Wei Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zhong Jiang <[email protected]> Cc: Zi Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
show more ...
|
| #
118b1366 |
| 13-Jul-2022 |
Laurent Dufour <[email protected]> |
powerpc/pseries/mobility: set NMI watchdog factor during an LPM
During an LPM, while the memory transfer is in progress on the arrival side, some latencies are generated when accessing not yet trans
powerpc/pseries/mobility: set NMI watchdog factor during an LPM
During an LPM, while the memory transfer is in progress on the arrival side, some latencies are generated when accessing not yet transferred pages on the arrival side. Thus, the NMI watchdog may be triggered too frequently, which increases the risk to hit an NMI interrupt in a bad place in the kernel, leading to a kernel panic.
Disabling the Hard Lockup Watchdog until the memory transfer could be a too strong work around, some users would want this timeout to be eventually triggered if the system is hanging even during an LPM.
Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a factor to the NMI watchdog timeout during an LPM. Just before the CPUs are stopped for the switchover sequence, the NMI watchdog timer is set to watchdog_thresh + factor%
A value of 0 has no effect. The default value is 200, meaning that the NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh value). Once the memory transfer is achieved, the factor is reset to 0.
Setting this value to a high number is like disabling the NMI watchdog during an LPM.
Signed-off-by: Laurent Dufour <[email protected]> Reviewed-by: Nicholas Piggin <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
show more ...
|