History log of /linux-6.15/Documentation/admin-guide/sysctl/kernel.rst (Results 1 – 25 of 81)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14, v6.14-rc7, v6.14-rc6, v6.14-rc5, v6.14-rc4
# 39ec9eaa 19-Feb-2025 Kees Cook <[email protected]>

coredump: Only sort VMAs when core_sort_vma sysctl is set

The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump
smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort
ba

coredump: Only sort VMAs when core_sort_vma sysctl is set

The sorting of VMAs by size in commit 7d442a33bfe8 ("binfmt_elf: Dump
smaller VMAs first in ELF cores") breaks elfutils[1]. Instead, sort
based on the setting of the new sysctl, core_sort_vma, which defaults
to 0, no sorting.

Reported-by: Michael Stapelberg <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]/ [1]
Fixes: 7d442a33bfe8 ("binfmt_elf: Dump smaller VMAs first in ELF cores")
Signed-off-by: Kees Cook <[email protected]>

show more ...


Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13
# e129fdc5 14-Jan-2025 Phil Auld <[email protected]>

Documentation/sysctl: Add timer_migration to kernel.rst

There is no mention of timer_migration in the docs. Add
a short description.

Signed-off-by: Phil Auld <[email protected]>
Cc: Jonathan Corbet

Documentation/sysctl: Add timer_migration to kernel.rst

There is no mention of timer_migration in the docs. Add
a short description.

Signed-off-by: Phil Auld <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: [email protected]
Signed-off-by: Jonathan Corbet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12, v6.12-rc7, v6.12-rc6, v6.12-rc5
# 62bf7065 27-Oct-2024 Lance Yang <[email protected]>

hung_task: add docs for hung_task_detect_count

This commit introduces documentation for hung_task_detect_count in
kernel.rst.

Link: https://lkml.kernel.org/r/20241027120747.42833-3-ioworker0@gmail.

hung_task: add docs for hung_task_detect_count

This commit introduces documentation for hung_task_detect_count in
kernel.rst.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mingzhe Yang <[email protected]>
Signed-off-by: Lance Yang <[email protected]>
Cc: Bang Li <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Huang Cun <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: John Siddle <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Ryan Roberts <[email protected]>
Cc: Thomas Weißschuh <[email protected]>
Cc: Yongliang Gao <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1
# cbade823 21-Jul-2024 Helge Deller <[email protected]>

parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN

Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs

parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN

Allow users to disable kernel warnings for unaligned memory
accesses from kernel via the /proc/sys/kernel/ignore-unaligned-usertrap
procfs entry.
That way users can disable those warnings in case they happen too
often.

Signed-off-by: Helge Deller <[email protected]>

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2, v6.10-rc1, v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1, v6.8, v6.8-rc7, v6.8-rc6
# 19f0423f 23-Feb-2024 Huang Yiwei <[email protected]>

tracing: Support to dump instance traces by ftrace_dump_on_oops

Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to

tracing: Support to dump instance traces by ftrace_dump_on_oops

Currently ftrace only dumps the global trace buffer on an OOPs. For
debugging a production usecase, instance trace will be helpful to
check specific problems since global trace buffer may be used for
other purposes.

This patch extend the ftrace_dump_on_oops parameter to dump a specific
or multiple trace instances:

- ftrace_dump_on_oops=0: as before -- don't dump
- ftrace_dump_on_oops[=1]: as before -- dump the global trace buffer
on all CPUs
- ftrace_dump_on_oops=2 or =orig_cpu: as before -- dump the global
trace buffer on CPU that triggered the oops
- ftrace_dump_on_oops=<instance_name>: new behavior -- dump the
tracing instance matching <instance_name>
- ftrace_dump_on_oops[=2/orig_cpu],<instance1_name>[=2/orig_cpu],
<instrance2_name>[=2/orig_cpu]: new behavior -- dump the global trace
buffer and multiple instance buffer on all CPUs, or only dump on CPU
that triggered the oops if =2 or =orig_cpu is given

Also, the sysctl node can handle the input accordingly.

Link: https://lore.kernel.org/linux-trace-kernel/[email protected]

Cc: Ross Zwisler <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Cc: <[email protected]>
Signed-off-by: Huang Yiwei <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>

show more ...


Revision tags: v6.8-rc5, v6.8-rc4, v6.8-rc3
# 2e3fc6ca 02-Feb-2024 Feng Tang <[email protected]>

panic: add option to dump blocked tasks in panic_print

For debugging kernel panics and other bugs, there is already an option of
panic_print to dump all tasks' call stacks. On today's large servers

panic: add option to dump blocked tasks in panic_print

For debugging kernel panics and other bugs, there is already an option of
panic_print to dump all tasks' call stacks. On today's large servers
running many containers, there could be thousands of tasks or more, and
this will print out huge amount of call stacks, taking a lot of time (for
serial console which is main target user case of panic_print).

And in many cases, only those several tasks being blocked are key for the
panic, so add an option to only dump blocked tasks' call stacks.

[[email protected]: clarify documentation a little]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Feng Tang <[email protected]>
Tested-by: Guilherme G. Piccoli <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Josh Poimboeuf <[email protected]>
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.8-rc2, v6.8-rc1
# 9220066e 15-Jan-2024 Alexey Gladkov <[email protected]>

docs: add information about ipc sysctls limitations

After 25b21cb2f6d6 ("[PATCH] IPC namespace core") and 4e9823111bdc
("[PATCH] IPC namespace - shm") the shared memory page count stopped being
glob

docs: add information about ipc sysctls limitations

After 25b21cb2f6d6 ("[PATCH] IPC namespace core") and 4e9823111bdc
("[PATCH] IPC namespace - shm") the shared memory page count stopped being
global and started counting per ipc namespace. The documentation and
shmget(2) still says that shmall is a global option.

shmget(2):

SHMALL System-wide limit on the total amount of shared memory, measured in
units of the system page size. On Linux, this limit can be read and
modified via /proc/sys/kernel/shmall.

I think the changes made in 2006 should be documented.

Link: https://lkml.kernel.org/r/09e99911071766958af488beb4e8a728a4f12135.1705333426.git.legion@kernel.org
Signed-off-by: Alexey Gladkov <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: "Eric W. Biederman" <[email protected]>
Link: https://lkml.kernel.org/r/ede20ddf7be48b93e8084c3be2e920841ee1a641.1663756794.git.legion@kernel.org
Cc: Christian Brauner <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Joel Granados <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Manfred Spraul <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.7, v6.7-rc8, v6.7-rc7, v6.7-rc6, v6.7-rc5, v6.7-rc4, v6.7-rc3, v6.7-rc2, v6.7-rc1, v6.6, v6.6-rc7, v6.6-rc6
# 8f833c82 09-Oct-2023 Shrikanth Hegde <[email protected]>

sched/topology: Change behaviour of the 'sched_energy_aware' sysctl, based on the platform

The 'sched_energy_aware' sysctl is available for the admin to disable/enable
energy aware scheduling(EAS).

sched/topology: Change behaviour of the 'sched_energy_aware' sysctl, based on the platform

The 'sched_energy_aware' sysctl is available for the admin to disable/enable
energy aware scheduling(EAS). EAS is enabled only if few conditions are
met by the platform. They are, asymmetric CPU capacity, no SMT,
schedutil CPUfreq governor, frequency invariant load tracking etc.
A platform may boot without EAS capability, but could gain such
capability at runtime. For example, changing/registering the cpufreq
governor to schedutil.

At present, though platform doesn't support EAS, this sysctl returns 1
and it ends up calling build_perf_domains on write to 1 and
NOP when writing to 0. That is confusing and un-necessary.

Desired behavior would be to have this sysctl to enable/disable the EAS
on supported platform. On non-supported platform write to the sysctl
would return not supported error and read of the sysctl would return
empty. So sched_energy_aware returns empty - EAS is not possible at this moment
This will include EAS capable platforms which have at least one EAS
condition false during startup, e.g. not using the schedutil cpufreq governor
sched_energy_aware returns 0 - EAS is supported but disabled by admin.
sched_energy_aware returns 1 - EAS is supported and enabled.

User can find out the reason why EAS is not possible by checking
info messages. sched_is_eas_possible returns true if the platform
can do EAS at this moment.

Signed-off-by: Shrikanth Hegde <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Pierre Gondois <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.6-rc5, v6.6-rc4, v6.6-rc3, v6.6-rc2, v6.6-rc1, v6.5, v6.5-rc7, v6.5-rc6, v6.5-rc5, v6.5-rc4, v6.5-rc3, v6.5-rc2, v6.5-rc1, v6.4, v6.4-rc7, v6.4-rc6, v6.4-rc5, v6.4-rc4, v6.4-rc3, v6.4-rc2, v6.4-rc1, v6.3, v6.3-rc7, v6.3-rc6, v6.3-rc5, v6.3-rc4, v6.3-rc3, v6.3-rc2, v6.3-rc1, v6.2, v6.2-rc8, v6.2-rc7, v6.2-rc6, v6.2-rc5, v6.2-rc4
# 94483490 13-Jan-2023 Ard Biesheuvel <[email protected]>

Documentation: Drop or replace remaining mentions of IA64

Drop or update mentions of IA64, as appropriate.

Signed-off-by: Ard Biesheuvel <[email protected]>


# 76d3ccec 21-Aug-2023 Matteo Rizzo <[email protected]>

io_uring: add a sysctl to disable io_uring system-wide

Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or
2. When 0 (the default), all processes are allowed to create io_uring
i

io_uring: add a sysctl to disable io_uring system-wide

Introduce a new sysctl (io_uring_disabled) which can be either 0, 1, or
2. When 0 (the default), all processes are allowed to create io_uring
instances, which is the current behavior. When 1, io_uring creation is
disabled (io_uring_setup() will fail with -EPERM) for unprivileged
processes not in the kernel.io_uring_group group. When 2, calls to
io_uring_setup() fail with -EPERM regardless of privilege.

Signed-off-by: Matteo Rizzo <[email protected]>
[JEM: modified to add io_uring_group]
Signed-off-by: Jeff Moyer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

show more ...


# 57972127 02-Aug-2023 Alexandre Ghiti <[email protected]>

Documentation: admin-guide: Add riscv sysctl_perf_user_access

riscv now uses this sysctl so document its usage for this architecture.

Signed-off-by: Alexandre Ghiti <[email protected]>


# e4624435 12-Jun-2023 Jonathan Corbet <[email protected]>

docs: arm64: Move arm64 documentation under Documentation/arch/

Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation direct

docs: arm64: Move arm64 documentation under Documentation/arch/

Architecture-specific documentation is being moved into Documentation/arch/
as a way of cleaning up the top-level documentation directory and making
the docs hierarchy more closely match the source hierarchy. Move
Documentation/arm64 into arch/ (along with the Chinese equvalent
translations) and fix up documentation references.

Cc: Will Deacon <[email protected]>
Cc: Alex Shi <[email protected]>
Cc: Hu Haowen <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Acked-by: Catalin Marinas <[email protected]>
Reviewed-by: Yantengsi <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>

show more ...


# ff61f079 14-Mar-2023 Jonathan Corbet <[email protected]>

docs: move x86 documentation into Documentation/arch/

Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
c

docs: move x86 documentation into Documentation/arch/

Move the x86 documentation under Documentation/arch/ as a way of cleaning
up the top-level directory and making the structure of our docs more
closely match the structure of the source directories it describes.

All in-kernel references to the old paths have been updated.

Acked-by: Dave Hansen <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Jonathan Corbet <[email protected]>

show more ...


Revision tags: v6.2-rc3
# a42aaad2 04-Jan-2023 Ricardo Ribalda <[email protected]>

kexec: introduce sysctl parameters kexec_load_limit_*

kexec allows replacing the current kernel with a different one. This is
usually a source of concerns for sysadmins that want to harden a system

kexec: introduce sysctl parameters kexec_load_limit_*

kexec allows replacing the current kernel with a different one. This is
usually a source of concerns for sysadmins that want to harden a system.

Linux already provides a way to disable loading new kexec kernel via
kexec_load_disabled, but that control is very coard, it is all or nothing
and does not make distinction between a panic kexec and a normal kexec.

This patch introduces new sysctl parameters, with finer tuning to specify
how many times a kexec kernel can be loaded. The sysadmin can set
different limits for kexec panic and kexec reboot kernels. The value can
be modified at runtime via sysctl, but only with a stricter value.

With these new parameters on place, a system with loadpin and verity
enabled, using the following kernel parameters:
sysctl.kexec_load_limit_reboot=0 sysct.kexec_load_limit_panic=1 can have a
good warranty that if initrd tries to load a panic kernel, a malitious
user will have small chances to replace that kernel with a different one,
even if they can trigger timeouts on the disk where the panic kernel
lives.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ricardo Ribalda <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Guilherme G. Piccoli <[email protected]> # Steam Deck
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 06dcb013 04-Jan-2023 Ricardo Ribalda <[email protected]>

Documentation: sysctl: correct kexec_load_disabled

Patch series "kexec: Add new parameter to limit the access to kexec", v6.

Add two parameter to specify how many times a kexec kernel can be loaded

Documentation: sysctl: correct kexec_load_disabled

Patch series "kexec: Add new parameter to limit the access to kexec", v6.

Add two parameter to specify how many times a kexec kernel can be loaded.

These parameter allow hardening the system.

While we are at it, fix a documentation issue and refactor some code.


This patch (of 3):

kexec_load_disabled affects both ``kexec_load`` and ``kexec_file_load``
syscalls. Make it explicit.

Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Ricardo Ribalda <[email protected]>
Reviewed-by: Steven Rostedt (Google) <[email protected]>
Acked-by: Baoquan He <[email protected]>
Cc: Bagas Sanjaya <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Guilherme G. Piccoli <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Philipp Rudo <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.2-rc2, v6.2-rc1, v6.1
# 61a6fccc 10-Dec-2022 Huacai Chen <[email protected]>

LoongArch: Add unaligned access support

Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support
unaligned access in hardware, while Loongson-3 series (Loongson-3A5000,
Loongson-3C5000) are

LoongArch: Add unaligned access support

Loongson-2 series (Loongson-2K500, Loongson-2K1000) don't support
unaligned access in hardware, while Loongson-3 series (Loongson-3A5000,
Loongson-3C5000) are configurable whether support unaligned access in
hardware. This patch add unaligned access emulation for those LoongArch
processors without hardware support.

Signed-off-by: Huacai Chen <[email protected]>

show more ...


Revision tags: v6.1-rc8, v6.1-rc7, v6.1-rc6
# 9fc9e278 17-Nov-2022 Kees Cook <[email protected]>

panic: Introduce warn_limit

Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <akpm@linux-fou

panic: Introduce warn_limit

Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: tangmeng <[email protected]>
Cc: "Guilherme G. Piccoli" <[email protected]>
Cc: Tiezhu Yang <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


# de92f657 02-Dec-2022 Kees Cook <[email protected]>

exit: Allow oops_limit to be disabled

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <[email protected]>
C

exit: Allow oops_limit to be disabled

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Baolin Wang <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Eric Biggers <[email protected]>
Cc: Huang Ying <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>

show more ...


# d4ccd54d 17-Nov-2022 Jann Horn <[email protected]>

exit: Put an upper limit on how often we can oops

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
co

exit: Put an upper limit on how often we can oops

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.

Signed-off-by: Jann Horn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Luis Chamberlain <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


Revision tags: v6.1-rc5, v6.1-rc4, v6.1-rc3, v6.1-rc2, v6.1-rc1, v6.0, v6.0-rc7, v6.0-rc6, v6.0-rc5, v6.0-rc4
# 8603b6f5 03-Sep-2022 Oleksandr Natalenko <[email protected]>

core_pattern: add CPU specifier

Statistically, in a large deployment regular segfaults may indicate a CPU
issue.

Currently, it is not possible to find out what CPU the segfault happened
on. There

core_pattern: add CPU specifier

Statistically, in a large deployment regular segfaults may indicate a CPU
issue.

Currently, it is not possible to find out what CPU the segfault happened
on. There are at least two attempts to improve segfault logging with this
regard, but they do not help in case the logs rotate.

Hence, lets make sure it is possible to permanently record a CPU the task
ran on using a new core_pattern specifier.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleksandr Natalenko <[email protected]>
Suggested-by: Renaud Métrich <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Grzegorz Halat <[email protected]>
Cc: "Guilherme G. Piccoli" <[email protected]>
Cc: "Huang, Ying" <[email protected]>
Cc: Jason A. Donenfeld <[email protected]>
Cc: Joel Savitz <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Stephen Kitt <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Xiaoming Ni <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 72720937 24-Oct-2022 Guilherme G. Piccoli <[email protected]>

x86/split_lock: Add sysctl to control the misery mode

Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;

x86/split_lock: Add sysctl to control the misery mode

Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.

[0] https://lore.kernel.org/lkml/[email protected]/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/

[ dhansen: minor changelog tweaks, including clarifying the actual
problem ]

Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers")
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
Tested-by: Andre Almeida <[email protected]>
Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com

show more ...


# aadc0cd5 30-Sep-2022 Stephen Kitt <[email protected]>

docs: sysctl/fs: re-order, prettify

This brings the text markup in line with sysctl/abi and
sysctl/kernel:

* the entries are ordered alphabetically
* the table of contents is automatically generate

docs: sysctl/fs: re-order, prettify

This brings the text markup in line with sysctl/abi and
sysctl/kernel:

* the entries are ordered alphabetically
* the table of contents is automatically generated
* markup is used as appropriate for constants etc.

The content isn't fully up-to-date but the obsolete entries are gone,
so remove the kernel version mention.

Signed-off-by: Stephen Kitt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Corbet <[email protected]>

show more ...


# bfca3dd3 01-Sep-2022 Petr Vorel <[email protected]>

kernel/utsname_sysctl.c: print kernel arch

Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.

This helps people who debug kernel with initramfs with minimal environment
(i.e.

kernel/utsname_sysctl.c: print kernel arch

Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.

This helps people who debug kernel with initramfs with minimal environment
(i.e. without coreutils or even busybox) or allow to open sysfs file
instead of run 'uname -m' in high level languages.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Petr Vorel <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Cc: David Sterba <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


Revision tags: v6.0-rc3, v6.0-rc2, v6.0-rc1, v5.19, v5.19-rc8, v5.19-rc7
# c6833e10 13-Jul-2022 Huang Ying <[email protected]>

memory tiering: rate limit NUMA migration throughput

In NUMA balancing memory tiering mode, if there are hot pages in slow
memory node and cold pages in fast memory node, we need to promote/demote
h

memory tiering: rate limit NUMA migration throughput

In NUMA balancing memory tiering mode, if there are hot pages in slow
memory node and cold pages in fast memory node, we need to promote/demote
hot/cold pages between the fast and cold memory nodes.

A choice is to promote/demote as fast as possible. But the CPU cycles and
memory bandwidth consumed by the high promoting/demoting throughput will
hurt the latency of some workload because of accessing inflating and slow
memory bandwidth contention.

A way to resolve this issue is to restrict the max promoting/demoting
throughput. It will take longer to finish the promoting/demoting. But
the workload latency will be better. This is implemented in this patch as
the page promotion rate limit mechanism.

The number of the candidate pages to be promoted to the fast memory node
via NUMA balancing is counted, if the count exceeds the limit specified by
the users, the NUMA balancing promotion will be stopped until the next
second.

A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added
for the users to specify the limit.

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: "Huang, Ying" <[email protected]>
Reviewed-by: Baolin Wang <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: osalvador <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Wei Xu <[email protected]>
Cc: Yang Shi <[email protected]>
Cc: Zhong Jiang <[email protected]>
Cc: Zi Yan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>

show more ...


# 118b1366 13-Jul-2022 Laurent Dufour <[email protected]>

powerpc/pseries/mobility: set NMI watchdog factor during an LPM

During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet trans

powerpc/pseries/mobility: set NMI watchdog factor during an LPM

During an LPM, while the memory transfer is in progress on the arrival
side, some latencies are generated when accessing not yet transferred
pages on the arrival side. Thus, the NMI watchdog may be triggered too
frequently, which increases the risk to hit an NMI interrupt in a bad
place in the kernel, leading to a kernel panic.

Disabling the Hard Lockup Watchdog until the memory transfer could be a
too strong work around, some users would want this timeout to be
eventually triggered if the system is hanging even during an LPM.

Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply
a factor to the NMI watchdog timeout during an LPM. Just before the CPUs
are stopped for the switchover sequence, the NMI watchdog timer is set
to watchdog_thresh + factor%

A value of 0 has no effect. The default value is 200, meaning that the
NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh
value). Once the memory transfer is achieved, the factor is reset to 0.

Setting this value to a high number is like disabling the NMI watchdog
during an LPM.

Signed-off-by: Laurent Dufour <[email protected]>
Reviewed-by: Nicholas Piggin <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]

show more ...


1234