157043247SMauro Carvalho Chehab=================================== 257043247SMauro Carvalho ChehabDocumentation for /proc/sys/kernel/ 357043247SMauro Carvalho Chehab=================================== 457043247SMauro Carvalho Chehab 5021622dfSStephen Kitt.. See scripts/check-sysctl-docs to keep this up to date 6021622dfSStephen Kitt 7021622dfSStephen Kitt 857043247SMauro Carvalho ChehabCopyright (c) 1998, 1999, Rik van Riel <[email protected]> 957043247SMauro Carvalho Chehab 1057043247SMauro Carvalho ChehabCopyright (c) 2009, Shen Feng<[email protected]> 1157043247SMauro Carvalho Chehab 122793e19dSMauro Carvalho ChehabFor general info and legal blurb, please look in 132793e19dSMauro Carvalho ChehabDocumentation/admin-guide/sysctl/index.rst. 1457043247SMauro Carvalho Chehab 1557043247SMauro Carvalho Chehab------------------------------------------------------------------------------ 1657043247SMauro Carvalho Chehab 1757043247SMauro Carvalho ChehabThis file contains documentation for the sysctl files in 18d151a23dSStephen Kitt``/proc/sys/kernel/``. 1957043247SMauro Carvalho Chehab 2057043247SMauro Carvalho ChehabThe files in this directory can be used to tune and monitor 2157043247SMauro Carvalho Chehabmiscellaneous and general things in the operation of the Linux 22a3cb66a5SStephen Kittkernel. Since some of the files *can* be used to screw up your 2357043247SMauro Carvalho Chehabsystem, it is advisable to read both documentation and source 2457043247SMauro Carvalho Chehabbefore actually making adjustments. 2557043247SMauro Carvalho Chehab 2657043247SMauro Carvalho ChehabCurrently, these files might (depending on your configuration) 27a3cb66a5SStephen Kittshow up in ``/proc/sys/kernel``: 2857043247SMauro Carvalho Chehab 29a3cb66a5SStephen Kitt.. contents:: :local: 3057043247SMauro Carvalho Chehab 3157043247SMauro Carvalho Chehab 32a3cb66a5SStephen Kittacct 33a3cb66a5SStephen Kitt==== 34a3cb66a5SStephen Kitt 35a3cb66a5SStephen Kitt:: 3657043247SMauro Carvalho Chehab 3757043247SMauro Carvalho Chehab highwater lowwater frequency 3857043247SMauro Carvalho Chehab 3957043247SMauro Carvalho ChehabIf BSD-style process accounting is enabled these values control 4057043247SMauro Carvalho Chehabits behaviour. If free space on filesystem where the log lives 4130fb8761SStephen Kittgoes below ``lowwater``\ % accounting suspends. If free space gets 4230fb8761SStephen Kittabove ``highwater``\ % accounting resumes. ``frequency`` determines 4357043247SMauro Carvalho Chehabhow often do we check the amount of free space (value is in 4457043247SMauro Carvalho Chehabseconds). Default: 45a3cb66a5SStephen Kitt 46a3cb66a5SStephen Kitt:: 47a3cb66a5SStephen Kitt 4857043247SMauro Carvalho Chehab 4 2 30 49a3cb66a5SStephen Kitt 50a3cb66a5SStephen KittThat is, suspend accounting if free space drops below 2%; resume it 51a3cb66a5SStephen Kittif it increases to at least 4%; consider information about amount of 52a3cb66a5SStephen Kittfree space valid for 30 seconds. 5357043247SMauro Carvalho Chehab 5457043247SMauro Carvalho Chehab 55a3cb66a5SStephen Kittacpi_video_flags 56a3cb66a5SStephen Kitt================ 5757043247SMauro Carvalho Chehab 582793e19dSMauro Carvalho ChehabSee Documentation/power/video.rst. This allows the video resume mode to be set, 592bd49cb5SStephen Kittin a similar fashion to the ``acpi_sleep`` kernel parameter, by 602bd49cb5SStephen Kittcombining the following values: 612bd49cb5SStephen Kitt 622bd49cb5SStephen Kitt= ======= 632bd49cb5SStephen Kitt1 s3_bios 642bd49cb5SStephen Kitt2 s3_mode 652bd49cb5SStephen Kitt4 s3_beep 662bd49cb5SStephen Kitt= ======= 6757043247SMauro Carvalho Chehab 68bfca3dd3SPetr Vorelarch 69bfca3dd3SPetr Vorel==== 70bfca3dd3SPetr Vorel 71bfca3dd3SPetr VorelThe machine hardware name, the same output as ``uname -m`` 72bfca3dd3SPetr Vorel(e.g. ``x86_64`` or ``aarch64``). 7357043247SMauro Carvalho Chehab 74a3cb66a5SStephen Kittauto_msgmni 75a3cb66a5SStephen Kitt=========== 7657043247SMauro Carvalho Chehab 7757043247SMauro Carvalho ChehabThis variable has no effect and may be removed in future kernel 7857043247SMauro Carvalho Chehabreleases. Reading it always returns 0. 79a3cb66a5SStephen KittUp to Linux 3.17, it enabled/disabled automatic recomputing of 80a3cb66a5SStephen Kitt`msgmni`_ 81a3cb66a5SStephen Kittupon memory add/remove or upon IPC namespace creation/removal. 8257043247SMauro Carvalho ChehabEchoing "1" into this file enabled msgmni automatic recomputing. 83a3cb66a5SStephen KittEchoing "0" turned it off. The default value was 1. 8457043247SMauro Carvalho Chehab 8557043247SMauro Carvalho Chehab 86a3cb66a5SStephen Kittbootloader_type (x86 only) 87a3cb66a5SStephen Kitt========================== 8857043247SMauro Carvalho Chehab 8957043247SMauro Carvalho ChehabThis gives the bootloader type number as indicated by the bootloader, 9057043247SMauro Carvalho Chehabshifted left by 4, and OR'd with the low four bits of the bootloader 9157043247SMauro Carvalho Chehabversion. The reason for this encoding is that this used to match the 92a3cb66a5SStephen Kitt``type_of_loader`` field in the kernel header; the encoding is kept for 9357043247SMauro Carvalho Chehabbackwards compatibility. That is, if the full bootloader type number 9457043247SMauro Carvalho Chehabis 0x15 and the full version number is 0x234, this file will contain 9557043247SMauro Carvalho Chehabthe value 340 = 0x154. 9657043247SMauro Carvalho Chehab 97a3cb66a5SStephen KittSee the ``type_of_loader`` and ``ext_loader_type`` fields in 98ff61f079SJonathan CorbetDocumentation/arch/x86/boot.rst for additional information. 9957043247SMauro Carvalho Chehab 10057043247SMauro Carvalho Chehab 101a3cb66a5SStephen Kittbootloader_version (x86 only) 102a3cb66a5SStephen Kitt============================= 10357043247SMauro Carvalho Chehab 10457043247SMauro Carvalho ChehabThe complete bootloader version number. In the example above, this 10557043247SMauro Carvalho Chehabfile will contain the value 564 = 0x234. 10657043247SMauro Carvalho Chehab 107a3cb66a5SStephen KittSee the ``type_of_loader`` and ``ext_loader_ver`` fields in 108ff61f079SJonathan CorbetDocumentation/arch/x86/boot.rst for additional information. 10957043247SMauro Carvalho Chehab 11057043247SMauro Carvalho Chehab 1115d8e5aeeSStephen Kittbpf_stats_enabled 1125d8e5aeeSStephen Kitt================= 1135d8e5aeeSStephen Kitt 1145d8e5aeeSStephen KittControls whether the kernel should collect statistics on BPF programs 1155d8e5aeeSStephen Kitt(total time spent running, number of times run...). Enabling 1165d8e5aeeSStephen Kittstatistics causes a slight reduction in performance on each program 1175d8e5aeeSStephen Kittrun. The statistics can be seen using ``bpftool``. 1185d8e5aeeSStephen Kitt 1195d8e5aeeSStephen Kitt= =================================== 1205d8e5aeeSStephen Kitt0 Don't collect statistics (default). 1215d8e5aeeSStephen Kitt1 Collect statistics. 1225d8e5aeeSStephen Kitt= =================================== 1235d8e5aeeSStephen Kitt 1245d8e5aeeSStephen Kitt 1256bc47621SStephen Kittcad_pid 1266bc47621SStephen Kitt======= 1276bc47621SStephen Kitt 1286bc47621SStephen KittThis is the pid which will be signalled on reboot (notably, by 1296bc47621SStephen KittCtrl-Alt-Delete). Writing a value to this file which doesn't 1306bc47621SStephen Kittcorrespond to a running process will result in ``-ESRCH``. 1316bc47621SStephen Kitt 1326bc47621SStephen KittSee also `ctrl-alt-del`_. 1336bc47621SStephen Kitt 1346bc47621SStephen Kitt 135a3cb66a5SStephen Kittcap_last_cap 136a3cb66a5SStephen Kitt============ 13757043247SMauro Carvalho Chehab 13857043247SMauro Carvalho ChehabHighest valid capability of the running kernel. Exports 139a3cb66a5SStephen Kitt``CAP_LAST_CAP`` from the kernel. 14057043247SMauro Carvalho Chehab 14157043247SMauro Carvalho Chehab 142aadc0cd5SStephen Kitt.. _core_pattern: 143aadc0cd5SStephen Kitt 144a3cb66a5SStephen Kittcore_pattern 145a3cb66a5SStephen Kitt============ 14657043247SMauro Carvalho Chehab 147a3cb66a5SStephen Kitt``core_pattern`` is used to specify a core dumpfile pattern name. 14857043247SMauro Carvalho Chehab 14957043247SMauro Carvalho Chehab* max length 127 characters; default value is "core" 150a3cb66a5SStephen Kitt* ``core_pattern`` is used as a pattern template for the output 151a3cb66a5SStephen Kitt filename; certain string patterns (beginning with '%') are 152a3cb66a5SStephen Kitt substituted with their actual values. 153a3cb66a5SStephen Kitt* backward compatibility with ``core_uses_pid``: 15457043247SMauro Carvalho Chehab 155a3cb66a5SStephen Kitt If ``core_pattern`` does not include "%p" (default does not) 156a3cb66a5SStephen Kitt and ``core_uses_pid`` is set, then .PID will be appended to 15757043247SMauro Carvalho Chehab the filename. 15857043247SMauro Carvalho Chehab 159a3cb66a5SStephen Kitt* corename format specifiers 16057043247SMauro Carvalho Chehab 161a3cb66a5SStephen Kitt ======== ========================================== 16257043247SMauro Carvalho Chehab %<NUL> '%' is dropped 16357043247SMauro Carvalho Chehab %% output one '%' 16457043247SMauro Carvalho Chehab %p pid 16557043247SMauro Carvalho Chehab %P global pid (init PID namespace) 16657043247SMauro Carvalho Chehab %i tid 16757043247SMauro Carvalho Chehab %I global tid (init PID namespace) 16857043247SMauro Carvalho Chehab %u uid (in initial user namespace) 16957043247SMauro Carvalho Chehab %g gid (in initial user namespace) 170a3cb66a5SStephen Kitt %d dump mode, matches ``PR_SET_DUMPABLE`` and 171a3cb66a5SStephen Kitt ``/proc/sys/fs/suid_dumpable`` 17257043247SMauro Carvalho Chehab %s signal number 17357043247SMauro Carvalho Chehab %t UNIX time of dump 17457043247SMauro Carvalho Chehab %h hostname 175f38c85f1SLepton Wu %e executable filename (may be shortened, could be changed by prctl etc) 176f38c85f1SLepton Wu %f executable filename 17757043247SMauro Carvalho Chehab %E executable path 178895f2c20S[email protected] %c maximum size of core file by resource limit RLIMIT_CORE 1798603b6f5SOleksandr Natalenko %C CPU the task ran on 18057043247SMauro Carvalho Chehab %<OTHER> both are dropped 181a3cb66a5SStephen Kitt ======== ========================================== 18257043247SMauro Carvalho Chehab 18357043247SMauro Carvalho Chehab* If the first character of the pattern is a '|', the kernel will treat 18457043247SMauro Carvalho Chehab the rest of the pattern as a command to run. The core dump will be 18557043247SMauro Carvalho Chehab written to the standard input of that program instead of to a file. 18657043247SMauro Carvalho Chehab 18757043247SMauro Carvalho Chehab 188a3cb66a5SStephen Kittcore_pipe_limit 189a3cb66a5SStephen Kitt=============== 19057043247SMauro Carvalho Chehab 191a3cb66a5SStephen KittThis sysctl is only applicable when `core_pattern`_ is configured to 192a3cb66a5SStephen Kittpipe core files to a user space helper (when the first character of 193a3cb66a5SStephen Kitt``core_pattern`` is a '|', see above). 194a3cb66a5SStephen KittWhen collecting cores via a pipe to an application, it is occasionally 195a3cb66a5SStephen Kittuseful for the collecting application to gather data about the 196a3cb66a5SStephen Kittcrashing process from its ``/proc/pid`` directory. 197a3cb66a5SStephen KittIn order to do this safely, the kernel must wait for the collecting 198a3cb66a5SStephen Kittprocess to exit, so as not to remove the crashing processes proc files 199a3cb66a5SStephen Kittprematurely. 200a3cb66a5SStephen KittThis in turn creates the possibility that a misbehaving userspace 201a3cb66a5SStephen Kittcollecting process can block the reaping of a crashed process simply 202a3cb66a5SStephen Kittby never exiting. 203a3cb66a5SStephen KittThis sysctl defends against that. 204a3cb66a5SStephen KittIt defines how many concurrent crashing processes may be piped to user 205a3cb66a5SStephen Kittspace applications in parallel. 206a3cb66a5SStephen KittIf this value is exceeded, then those crashing processes above that 207a3cb66a5SStephen Kittvalue are noted via the kernel log and their cores are skipped. 208a3cb66a5SStephen Kitt0 is a special value, indicating that unlimited processes may be 209a3cb66a5SStephen Kittcaptured in parallel, but that no waiting will take place (i.e. the 210a3cb66a5SStephen Kittcollecting process is not guaranteed access to ``/proc/<crashing 211a3cb66a5SStephen Kittpid>/``). 212a3cb66a5SStephen KittThis value defaults to 0. 21357043247SMauro Carvalho Chehab 21457043247SMauro Carvalho Chehab 215*39ec9eaaSKees Cookcore_sort_vma 216*39ec9eaaSKees Cook============= 217*39ec9eaaSKees Cook 218*39ec9eaaSKees CookThe default coredump writes VMAs in address order. By setting 219*39ec9eaaSKees Cook``core_sort_vma`` to 1, VMAs will be written from smallest size 220*39ec9eaaSKees Cookto largest size. This is known to break at least elfutils, but 221*39ec9eaaSKees Cookcan be handy when dealing with very large (and truncated) 222*39ec9eaaSKees Cookcoredumps where the more useful debugging details are included 223*39ec9eaaSKees Cookin the smaller VMAs. 224*39ec9eaaSKees Cook 225*39ec9eaaSKees Cook 226a3cb66a5SStephen Kittcore_uses_pid 227a3cb66a5SStephen Kitt============= 22857043247SMauro Carvalho Chehab 22957043247SMauro Carvalho ChehabThe default coredump filename is "core". By setting 230a3cb66a5SStephen Kitt``core_uses_pid`` to 1, the coredump filename becomes core.PID. 231a3cb66a5SStephen KittIf `core_pattern`_ does not include "%p" (default does not) 232a3cb66a5SStephen Kittand ``core_uses_pid`` is set, then .PID will be appended to 23357043247SMauro Carvalho Chehabthe filename. 23457043247SMauro Carvalho Chehab 23557043247SMauro Carvalho Chehab 236a3cb66a5SStephen Kittctrl-alt-del 237a3cb66a5SStephen Kitt============ 23857043247SMauro Carvalho Chehab 23957043247SMauro Carvalho ChehabWhen the value in this file is 0, ctrl-alt-del is trapped and 240a3cb66a5SStephen Kittsent to the ``init(1)`` program to handle a graceful restart. 24157043247SMauro Carvalho ChehabWhen, however, the value is > 0, Linux's reaction to a Vulcan 24257043247SMauro Carvalho ChehabNerve Pinch (tm) will be an immediate reboot, without even 24357043247SMauro Carvalho Chehabsyncing its dirty buffers. 24457043247SMauro Carvalho Chehab 24557043247SMauro Carvalho ChehabNote: 24657043247SMauro Carvalho Chehab when a program (like dosemu) has the keyboard in 'raw' 24757043247SMauro Carvalho Chehab mode, the ctrl-alt-del is intercepted by the program before it 24857043247SMauro Carvalho Chehab ever reaches the kernel tty layer, and it's up to the program 24957043247SMauro Carvalho Chehab to decide what to do with it. 25057043247SMauro Carvalho Chehab 25157043247SMauro Carvalho Chehab 252a3cb66a5SStephen Kittdmesg_restrict 253a3cb66a5SStephen Kitt============== 25457043247SMauro Carvalho Chehab 25557043247SMauro Carvalho ChehabThis toggle indicates whether unprivileged users are prevented 256a3cb66a5SStephen Kittfrom using ``dmesg(8)`` to view messages from the kernel's log 257a3cb66a5SStephen Kittbuffer. 258a3cb66a5SStephen KittWhen ``dmesg_restrict`` is set to 0 there are no restrictions. 259ee74db08SRandy DunlapWhen ``dmesg_restrict`` is set to 1, users must have 260a3cb66a5SStephen Kitt``CAP_SYSLOG`` to use ``dmesg(8)``. 26157043247SMauro Carvalho Chehab 262a3cb66a5SStephen KittThe kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the 263a3cb66a5SStephen Kittdefault value of ``dmesg_restrict``. 26457043247SMauro Carvalho Chehab 26557043247SMauro Carvalho Chehab 266a3cb66a5SStephen Kittdomainname & hostname 267a3cb66a5SStephen Kitt===================== 26857043247SMauro Carvalho Chehab 26957043247SMauro Carvalho ChehabThese files can be used to set the NIS/YP domainname and the 27057043247SMauro Carvalho Chehabhostname of your box in exactly the same way as the commands 27157043247SMauro Carvalho Chehabdomainname and hostname, i.e.:: 27257043247SMauro Carvalho Chehab 27357043247SMauro Carvalho Chehab # echo "darkstar" > /proc/sys/kernel/hostname 27457043247SMauro Carvalho Chehab # echo "mydomain" > /proc/sys/kernel/domainname 27557043247SMauro Carvalho Chehab 27657043247SMauro Carvalho Chehabhas the same effect as:: 27757043247SMauro Carvalho Chehab 27857043247SMauro Carvalho Chehab # hostname "darkstar" 27957043247SMauro Carvalho Chehab # domainname "mydomain" 28057043247SMauro Carvalho Chehab 28157043247SMauro Carvalho ChehabNote, however, that the classic darkstar.frop.org has the 28257043247SMauro Carvalho Chehabhostname "darkstar" and DNS (Internet Domain Name Server) 28357043247SMauro Carvalho Chehabdomainname "frop.org", not to be confused with the NIS (Network 28457043247SMauro Carvalho ChehabInformation Service) or YP (Yellow Pages) domainname. These two 28557043247SMauro Carvalho Chehabdomain names are in general different. For a detailed discussion 286a3cb66a5SStephen Kittsee the ``hostname(1)`` man page. 28757043247SMauro Carvalho Chehab 28857043247SMauro Carvalho Chehab 289d75829c1SStephen Kittfirmware_config 290d75829c1SStephen Kitt=============== 291d75829c1SStephen Kitt 2922793e19dSMauro Carvalho ChehabSee Documentation/driver-api/firmware/fallback-mechanisms.rst. 293d75829c1SStephen Kitt 294d75829c1SStephen KittThe entries in this directory allow the firmware loader helper 295d75829c1SStephen Kittfallback to be controlled: 296d75829c1SStephen Kitt 297d75829c1SStephen Kitt* ``force_sysfs_fallback``, when set to 1, forces the use of the 298d75829c1SStephen Kitt fallback; 299d75829c1SStephen Kitt* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback. 300d75829c1SStephen Kitt 301d75829c1SStephen Kitt 30250cdae76SStephen Kittftrace_dump_on_oops 30350cdae76SStephen Kitt=================== 30450cdae76SStephen Kitt 30550cdae76SStephen KittDetermines whether ``ftrace_dump()`` should be called on an oops (or 30650cdae76SStephen Kittkernel panic). This will output the contents of the ftrace buffers to 30750cdae76SStephen Kittthe console. This is very useful for capturing traces that lead to 30850cdae76SStephen Kittcrashes and outputting them to a serial console. 30950cdae76SStephen Kitt 31019f0423fSHuang Yiwei======================= =========================================== 31150cdae76SStephen Kitt0 Disabled (default). 31250cdae76SStephen Kitt1 Dump buffers of all CPUs. 31319f0423fSHuang Yiwei2(orig_cpu) Dump the buffer of the CPU that triggered the 31419f0423fSHuang Yiwei oops. 31519f0423fSHuang Yiwei<instance> Dump the specific instance buffer on all CPUs. 31619f0423fSHuang Yiwei<instance>=2(orig_cpu) Dump the specific instance buffer on the CPU 31719f0423fSHuang Yiwei that triggered the oops. 31819f0423fSHuang Yiwei======================= =========================================== 31950cdae76SStephen Kitt 32019f0423fSHuang YiweiMultiple instance dump is also supported, and instances are separated 32119f0423fSHuang Yiweiby commas. If global buffer also needs to be dumped, please specify 32219f0423fSHuang Yiweithe dump mode (1/2/orig_cpu) first for global buffer. 32319f0423fSHuang Yiwei 32419f0423fSHuang YiweiSo for example to dump "foo" and "bar" instance buffer on all CPUs, 32519f0423fSHuang Yiweiuser can:: 32619f0423fSHuang Yiwei 32719f0423fSHuang Yiwei echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops 32819f0423fSHuang Yiwei 32919f0423fSHuang YiweiTo dump global buffer and "foo" instance buffer on all 33019f0423fSHuang YiweiCPUs along with the "bar" instance buffer on CPU that triggered the 33119f0423fSHuang Yiweioops, user can:: 33219f0423fSHuang Yiwei 33319f0423fSHuang Yiwei echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops 33450cdae76SStephen Kitt 33550cdae76SStephen Kittftrace_enabled, stack_tracer_enabled 33650cdae76SStephen Kitt==================================== 33750cdae76SStephen Kitt 3382793e19dSMauro Carvalho ChehabSee Documentation/trace/ftrace.rst. 33950cdae76SStephen Kitt 34050cdae76SStephen Kitt 341a3cb66a5SStephen Kitthardlockup_all_cpu_backtrace 342a3cb66a5SStephen Kitt============================ 34357043247SMauro Carvalho Chehab 34457043247SMauro Carvalho ChehabThis value controls the hard lockup detector behavior when a hard 34557043247SMauro Carvalho Chehablockup condition is detected as to whether or not to gather further 34657043247SMauro Carvalho Chehabdebug information. If enabled, arch-specific all-CPU stack dumping 34757043247SMauro Carvalho Chehabwill be initiated. 34857043247SMauro Carvalho Chehab 349a3cb66a5SStephen Kitt= ============================================ 350a3cb66a5SStephen Kitt0 Do nothing. This is the default behavior. 351a3cb66a5SStephen Kitt1 On detection capture more debug information. 352a3cb66a5SStephen Kitt= ============================================ 35357043247SMauro Carvalho Chehab 35457043247SMauro Carvalho Chehab 355a3cb66a5SStephen Kitthardlockup_panic 356a3cb66a5SStephen Kitt================ 35757043247SMauro Carvalho Chehab 35857043247SMauro Carvalho ChehabThis parameter can be used to control whether the kernel panics 35957043247SMauro Carvalho Chehabwhen a hard lockup is detected. 36057043247SMauro Carvalho Chehab 361a3cb66a5SStephen Kitt= =========================== 362a3cb66a5SStephen Kitt0 Don't panic on hard lockup. 363a3cb66a5SStephen Kitt1 Panic on hard lockup. 364a3cb66a5SStephen Kitt= =========================== 36557043247SMauro Carvalho Chehab 3662793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/lockup-watchdogs.rst for more information. 367a3cb66a5SStephen KittThis can also be set using the nmi_watchdog kernel parameter. 36857043247SMauro Carvalho Chehab 36957043247SMauro Carvalho Chehab 370a3cb66a5SStephen Kitthotplug 371a3cb66a5SStephen Kitt======= 37257043247SMauro Carvalho Chehab 37357043247SMauro Carvalho ChehabPath for the hotplug policy agent. 3741e886090SRasmus VillemoesDefault value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults 3751e886090SRasmus Villemoesto the empty string. 3761e886090SRasmus Villemoes 3771e886090SRasmus VillemoesThis file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most 3781e886090SRasmus Villemoesmodern systems rely exclusively on the netlink-based uevent source and 3791e886090SRasmus Villemoesdon't need this. 38057043247SMauro Carvalho Chehab 38157043247SMauro Carvalho Chehab 382e996919bSRandy Dunlaphung_task_all_cpu_backtrace 383e996919bSRandy Dunlap=========================== 3840ec9dc9bSGuilherme G. Piccoli 3850ec9dc9bSGuilherme G. PiccoliIf this option is set, the kernel will send an NMI to all CPUs to dump 3860ec9dc9bSGuilherme G. Piccolitheir backtraces when a hung task is detected. This file shows up if 3870ec9dc9bSGuilherme G. PiccoliCONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled. 3880ec9dc9bSGuilherme G. Piccoli 3890ec9dc9bSGuilherme G. Piccoli0: Won't show all CPUs backtraces when a hung task is detected. 3900ec9dc9bSGuilherme G. PiccoliThis is the default behavior. 3910ec9dc9bSGuilherme G. Piccoli 3920ec9dc9bSGuilherme G. Piccoli1: Will non-maskably interrupt all CPUs and dump their backtraces when 3930ec9dc9bSGuilherme G. Piccolia hung task is detected. 3940ec9dc9bSGuilherme G. Piccoli 3950ec9dc9bSGuilherme G. Piccoli 396a3cb66a5SStephen Kitthung_task_panic 397a3cb66a5SStephen Kitt=============== 39857043247SMauro Carvalho Chehab 39957043247SMauro Carvalho ChehabControls the kernel's behavior when a hung task is detected. 400a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 40157043247SMauro Carvalho Chehab 402a3cb66a5SStephen Kitt= ================================================= 403a3cb66a5SStephen Kitt0 Continue operation. This is the default behavior. 404a3cb66a5SStephen Kitt1 Panic immediately. 405a3cb66a5SStephen Kitt= ================================================= 40657043247SMauro Carvalho Chehab 40757043247SMauro Carvalho Chehab 408a3cb66a5SStephen Kitthung_task_check_count 409a3cb66a5SStephen Kitt===================== 41057043247SMauro Carvalho Chehab 41157043247SMauro Carvalho ChehabThe upper bound on the number of tasks that are checked. 412a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 41357043247SMauro Carvalho Chehab 41457043247SMauro Carvalho Chehab 41562bf7065SLance Yanghung_task_detect_count 41662bf7065SLance Yang====================== 41762bf7065SLance Yang 41862bf7065SLance YangIndicates the total number of tasks that have been detected as hung since 41962bf7065SLance Yangthe system boot. 42062bf7065SLance Yang 42162bf7065SLance YangThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 42262bf7065SLance Yang 42362bf7065SLance Yang 424a3cb66a5SStephen Kitthung_task_timeout_secs 425a3cb66a5SStephen Kitt====================== 42657043247SMauro Carvalho Chehab 42757043247SMauro Carvalho ChehabWhen a task in D state did not get scheduled 42857043247SMauro Carvalho Chehabfor more than this value report a warning. 429a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 43057043247SMauro Carvalho Chehab 431a3cb66a5SStephen Kitt0 means infinite timeout, no checking is done. 43257043247SMauro Carvalho Chehab 433a3cb66a5SStephen KittPossible values to set are in range {0:``LONG_MAX``/``HZ``}. 43457043247SMauro Carvalho Chehab 43557043247SMauro Carvalho Chehab 436a3cb66a5SStephen Kitthung_task_check_interval_secs 437a3cb66a5SStephen Kitt============================= 43857043247SMauro Carvalho Chehab 43957043247SMauro Carvalho ChehabHung task check interval. If hung task checking is enabled 440a3cb66a5SStephen Kitt(see `hung_task_timeout_secs`_), the check is done every 441a3cb66a5SStephen Kitt``hung_task_check_interval_secs`` seconds. 442a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 44357043247SMauro Carvalho Chehab 444a3cb66a5SStephen Kitt0 (default) means use ``hung_task_timeout_secs`` as checking 445a3cb66a5SStephen Kittinterval. 446a3cb66a5SStephen Kitt 447a3cb66a5SStephen KittPossible values to set are in range {0:``LONG_MAX``/``HZ``}. 44857043247SMauro Carvalho Chehab 44957043247SMauro Carvalho Chehab 450a3cb66a5SStephen Kitthung_task_warnings 451a3cb66a5SStephen Kitt================== 45257043247SMauro Carvalho Chehab 45357043247SMauro Carvalho ChehabThe maximum number of warnings to report. During a check interval 45457043247SMauro Carvalho Chehabif a hung task is detected, this value is decreased by 1. 45557043247SMauro Carvalho ChehabWhen this value reaches 0, no more warnings will be reported. 456a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 45757043247SMauro Carvalho Chehab 45857043247SMauro Carvalho Chehab-1: report an infinite number of warnings. 45957043247SMauro Carvalho Chehab 46057043247SMauro Carvalho Chehab 461a3cb66a5SStephen Kitthyperv_record_panic_msg 462a3cb66a5SStephen Kitt======================= 46357043247SMauro Carvalho Chehab 46457043247SMauro Carvalho ChehabControls whether the panic kmsg data should be reported to Hyper-V. 46557043247SMauro Carvalho Chehab 466a3cb66a5SStephen Kitt= ========================================================= 467a3cb66a5SStephen Kitt0 Do not report panic kmsg data. 468a3cb66a5SStephen Kitt1 Report the panic kmsg data. This is the default behavior. 469a3cb66a5SStephen Kitt= ========================================================= 47057043247SMauro Carvalho Chehab 47157043247SMauro Carvalho Chehab 472997c798eSStephen Kittignore-unaligned-usertrap 473997c798eSStephen Kitt========================= 474997c798eSStephen Kitt 475997c798eSStephen KittOn architectures where unaligned accesses cause traps, and where this 476997c798eSStephen Kittfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``; 477cbade823SHelge Dellercurrently, ``arc``, ``parisc`` and ``loongarch``), controls whether all 47861a6fcccSHuacai Chenunaligned traps are logged. 479997c798eSStephen Kitt 480997c798eSStephen Kitt= ============================================================= 481997c798eSStephen Kitt0 Log all unaligned accesses. 482997c798eSStephen Kitt1 Only warn the first time a process traps. This is the default 483997c798eSStephen Kitt setting. 484997c798eSStephen Kitt= ============================================================= 485997c798eSStephen Kitt 48694483490SArd BiesheuvelSee also `unaligned-trap`_. 487997c798eSStephen Kitt 48876d3ccecSMatteo Rizzoio_uring_disabled 48976d3ccecSMatteo Rizzo================= 49076d3ccecSMatteo Rizzo 49176d3ccecSMatteo RizzoPrevents all processes from creating new io_uring instances. Enabling this 49276d3ccecSMatteo Rizzoshrinks the kernel's attack surface. 49376d3ccecSMatteo Rizzo 49476d3ccecSMatteo Rizzo= ====================================================================== 49576d3ccecSMatteo Rizzo0 All processes can create io_uring instances as normal. This is the 49676d3ccecSMatteo Rizzo default setting. 49776d3ccecSMatteo Rizzo1 io_uring creation is disabled (io_uring_setup() will fail with 49876d3ccecSMatteo Rizzo -EPERM) for unprivileged processes not in the io_uring_group group. 49976d3ccecSMatteo Rizzo Existing io_uring instances can still be used. See the 50076d3ccecSMatteo Rizzo documentation for io_uring_group for more information. 50176d3ccecSMatteo Rizzo2 io_uring creation is disabled for all processes. io_uring_setup() 50276d3ccecSMatteo Rizzo always fails with -EPERM. Existing io_uring instances can still be 50376d3ccecSMatteo Rizzo used. 50476d3ccecSMatteo Rizzo= ====================================================================== 50576d3ccecSMatteo Rizzo 50676d3ccecSMatteo Rizzo 50776d3ccecSMatteo Rizzoio_uring_group 50876d3ccecSMatteo Rizzo============== 50976d3ccecSMatteo Rizzo 51076d3ccecSMatteo RizzoWhen io_uring_disabled is set to 1, a process must either be 51176d3ccecSMatteo Rizzoprivileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order 51276d3ccecSMatteo Rizzoto create an io_uring instance. If io_uring_group is set to -1 (the 51376d3ccecSMatteo Rizzodefault), only processes with the CAP_SYS_ADMIN capability may create 51476d3ccecSMatteo Rizzoio_uring instances. 51576d3ccecSMatteo Rizzo 51676d3ccecSMatteo Rizzo 517a3cb66a5SStephen Kittkexec_load_disabled 518a3cb66a5SStephen Kitt=================== 51957043247SMauro Carvalho Chehab 52006dcb013SRicardo RibaldaA toggle indicating if the syscalls ``kexec_load`` and 52106dcb013SRicardo Ribalda``kexec_file_load`` have been disabled. 52206dcb013SRicardo RibaldaThis value defaults to 0 (false: ``kexec_*load`` enabled), but can be 52306dcb013SRicardo Ribaldaset to 1 (true: ``kexec_*load`` disabled). 524a3cb66a5SStephen KittOnce true, kexec can no longer be used, and the toggle cannot be set 525a3cb66a5SStephen Kittback to false. 526a3cb66a5SStephen KittThis allows a kexec image to be loaded before disabling the syscall, 527a3cb66a5SStephen Kittallowing a system to set up (and later use) an image without it being 528a3cb66a5SStephen Kittaltered. 529a3cb66a5SStephen KittGenerally used together with the `modules_disabled`_ sysctl. 53057043247SMauro Carvalho Chehab 531a42aaad2SRicardo Ribaldakexec_load_limit_panic 532a42aaad2SRicardo Ribalda====================== 533a42aaad2SRicardo Ribalda 534a42aaad2SRicardo RibaldaThis parameter specifies a limit to the number of times the syscalls 535a42aaad2SRicardo Ribalda``kexec_load`` and ``kexec_file_load`` can be called with a crash 536a42aaad2SRicardo Ribaldaimage. It can only be set with a more restrictive value than the 537a42aaad2SRicardo Ribaldacurrent one. 538a42aaad2SRicardo Ribalda 539a42aaad2SRicardo Ribalda== ====================================================== 540a42aaad2SRicardo Ribalda-1 Unlimited calls to kexec. This is the default setting. 541a42aaad2SRicardo RibaldaN Number of calls left. 542a42aaad2SRicardo Ribalda== ====================================================== 543a42aaad2SRicardo Ribalda 544a42aaad2SRicardo Ribaldakexec_load_limit_reboot 545a42aaad2SRicardo Ribalda======================= 546a42aaad2SRicardo Ribalda 547a42aaad2SRicardo RibaldaSimilar functionality as ``kexec_load_limit_panic``, but for a normal 548a42aaad2SRicardo Ribaldaimage. 54957043247SMauro Carvalho Chehab 550a3cb66a5SStephen Kittkptr_restrict 551a3cb66a5SStephen Kitt============= 55257043247SMauro Carvalho Chehab 55357043247SMauro Carvalho ChehabThis toggle indicates whether restrictions are placed on 554a3cb66a5SStephen Kittexposing kernel addresses via ``/proc`` and other interfaces. 55557043247SMauro Carvalho Chehab 556a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 0 (the default) the address is hashed 557a3cb66a5SStephen Kittbefore printing. 558a3cb66a5SStephen Kitt(This is the equivalent to %p.) 55957043247SMauro Carvalho Chehab 560a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 1, kernel pointers printed using the 561a3cb66a5SStephen Kitt%pK format specifier will be replaced with 0s unless the user has 562a3cb66a5SStephen Kitt``CAP_SYSLOG`` and effective user and group ids are equal to the real 563a3cb66a5SStephen Kittids. 564a3cb66a5SStephen KittThis is because %pK checks are done at read() time rather than open() 565a3cb66a5SStephen Kitttime, so if permissions are elevated between the open() and the read() 566a3cb66a5SStephen Kitt(e.g via a setuid binary) then %pK will not leak kernel pointers to 567a3cb66a5SStephen Kittunprivileged users. 568a3cb66a5SStephen KittNote, this is a temporary solution only. 569a3cb66a5SStephen KittThe correct long-term solution is to do the permission checks at 570a3cb66a5SStephen Kittopen() time. 571a3cb66a5SStephen KittConsider removing world read permissions from files that use %pK, and 572a3cb66a5SStephen Kittusing `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)`` 573a3cb66a5SStephen Kittif leaking kernel pointer values to unprivileged users is a concern. 57457043247SMauro Carvalho Chehab 575a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 2, kernel pointers printed using 576a3cb66a5SStephen Kitt%pK will be replaced with 0s regardless of privileges. 57757043247SMauro Carvalho Chehab 57857043247SMauro Carvalho Chehab 579a3cb66a5SStephen Kittmodprobe 580a3cb66a5SStephen Kitt======== 581a3cb66a5SStephen Kitt 58252338dfbSEric BiggersThe full path to the usermode helper for autoloading kernel modules, 583f4d3f25aSRasmus Villemoesby default ``CONFIG_MODPROBE_PATH``, which in turn defaults to 584f4d3f25aSRasmus Villemoes"/sbin/modprobe". This binary is executed when the kernel requests a 585f4d3f25aSRasmus Villemoesmodule. For example, if userspace passes an unknown filesystem type 586f4d3f25aSRasmus Villemoesto mount(), then the kernel will automatically request the 587f4d3f25aSRasmus Villemoescorresponding filesystem module by executing this usermode helper. 58852338dfbSEric BiggersThis usermode helper should insert the needed module into the kernel. 58952338dfbSEric Biggers 59052338dfbSEric BiggersThis sysctl only affects module autoloading. It has no effect on the 59152338dfbSEric Biggersability to explicitly insert modules. 59252338dfbSEric Biggers 59352338dfbSEric BiggersThis sysctl can be used to debug module loading requests:: 5940317c537SStephen Kitt 5950317c537SStephen Kitt echo '#! /bin/sh' > /tmp/modprobe 5960317c537SStephen Kitt echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe 5970317c537SStephen Kitt echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe 5980317c537SStephen Kitt chmod a+x /tmp/modprobe 5990317c537SStephen Kitt echo /tmp/modprobe > /proc/sys/kernel/modprobe 6000317c537SStephen Kitt 60152338dfbSEric BiggersAlternatively, if this sysctl is set to the empty string, then module 60252338dfbSEric Biggersautoloading is completely disabled. The kernel will not try to 60352338dfbSEric Biggersexecute a usermode helper at all, nor will it call the 60452338dfbSEric Biggerskernel_module_request LSM hook. 605a3cb66a5SStephen Kitt 60652338dfbSEric BiggersIf CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration, 60752338dfbSEric Biggersthen the configured static usermode helper overrides this sysctl, 60852338dfbSEric Biggersexcept that the empty string is still accepted to completely disable 60952338dfbSEric Biggersmodule autoloading as described above. 610a3cb66a5SStephen Kitt 611a3cb66a5SStephen Kittmodules_disabled 612a3cb66a5SStephen Kitt================ 61357043247SMauro Carvalho Chehab 61457043247SMauro Carvalho ChehabA toggle value indicating if modules are allowed to be loaded 61557043247SMauro Carvalho Chehabin an otherwise modular kernel. This toggle defaults to off 61657043247SMauro Carvalho Chehab(0), but can be set true (1). Once true, modules can be 61757043247SMauro Carvalho Chehabneither loaded nor unloaded, and the toggle cannot be set back 618a3cb66a5SStephen Kittto false. Generally used with the `kexec_load_disabled`_ toggle. 61957043247SMauro Carvalho Chehab 62057043247SMauro Carvalho Chehab 621a3cb66a5SStephen Kitt.. _msgmni: 622a3cb66a5SStephen Kitt 623a3cb66a5SStephen Kittmsgmax, msgmnb, and msgmni 624a3cb66a5SStephen Kitt========================== 625a3cb66a5SStephen Kitt 626fa5b5264SStephen Kitt``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by 627fa5b5264SStephen Kittdefault (``MSGMAX``). 628fa5b5264SStephen Kitt 629fa5b5264SStephen Kitt``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by 630fa5b5264SStephen Kittdefault (``MSGMNB``). 631fa5b5264SStephen Kitt 632fa5b5264SStephen Kitt``msgmni`` is the maximum number of IPC queues. 32000 by default 633fa5b5264SStephen Kitt(``MSGMNI``). 634fa5b5264SStephen Kitt 6359220066eSAlexey GladkovAll of these parameters are set per ipc namespace. The maximum number of bytes 6369220066eSAlexey Gladkovin POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is 6379220066eSAlexey Gladkovrespected hierarchically in the each user namespace. 638a3cb66a5SStephen Kitt 639a3cb66a5SStephen Kittmsg_next_id, sem_next_id, and shm_next_id (System V IPC) 640a3cb66a5SStephen Kitt======================================================== 64157043247SMauro Carvalho Chehab 64257043247SMauro Carvalho ChehabThese three toggles allows to specify desired id for next allocated IPC 64357043247SMauro Carvalho Chehabobject: message, semaphore or shared memory respectively. 64457043247SMauro Carvalho Chehab 64557043247SMauro Carvalho ChehabBy default they are equal to -1, which means generic allocation logic. 646a3cb66a5SStephen KittPossible values to set are in range {0:``INT_MAX``}. 64757043247SMauro Carvalho Chehab 64857043247SMauro Carvalho ChehabNotes: 64957043247SMauro Carvalho Chehab 1) kernel doesn't guarantee, that new object will have desired id. So, 65057043247SMauro Carvalho Chehab it's up to userspace, how to handle an object with "wrong" id. 65157043247SMauro Carvalho Chehab 2) Toggle with non-default value will be set back to -1 by kernel after 65257043247SMauro Carvalho Chehab successful IPC object allocation. If an IPC object allocation syscall 65357043247SMauro Carvalho Chehab fails, it is undefined if the value remains unmodified or is reset to -1. 65457043247SMauro Carvalho Chehab 65517444d9bSStephen Kitt 65617444d9bSStephen Kittngroups_max 65717444d9bSStephen Kitt=========== 65817444d9bSStephen Kitt 65917444d9bSStephen KittMaximum number of supplementary groups, _i.e._ the maximum size which 66017444d9bSStephen Kitt``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel. 66117444d9bSStephen Kitt 66217444d9bSStephen Kitt 66317444d9bSStephen Kitt 664a3cb66a5SStephen Kittnmi_watchdog 665a3cb66a5SStephen Kitt============ 66657043247SMauro Carvalho Chehab 66757043247SMauro Carvalho ChehabThis parameter can be used to control the NMI watchdog 66857043247SMauro Carvalho Chehab(i.e. the hard lockup detector) on x86 systems. 66957043247SMauro Carvalho Chehab 670a3cb66a5SStephen Kitt= ================================= 671a3cb66a5SStephen Kitt0 Disable the hard lockup detector. 672a3cb66a5SStephen Kitt1 Enable the hard lockup detector. 673a3cb66a5SStephen Kitt= ================================= 67457043247SMauro Carvalho Chehab 67557043247SMauro Carvalho ChehabThe hard lockup detector monitors each CPU for its ability to respond to 67657043247SMauro Carvalho Chehabtimer interrupts. The mechanism utilizes CPU performance counter registers 67757043247SMauro Carvalho Chehabthat are programmed to generate Non-Maskable Interrupts (NMIs) periodically 67857043247SMauro Carvalho Chehabwhile a CPU is busy. Hence, the alternative name 'NMI watchdog'. 67957043247SMauro Carvalho Chehab 68057043247SMauro Carvalho ChehabThe NMI watchdog is disabled by default if the kernel is running as a guest 68157043247SMauro Carvalho Chehabin a KVM virtual machine. This default can be overridden by adding:: 68257043247SMauro Carvalho Chehab 68357043247SMauro Carvalho Chehab nmi_watchdog=1 68457043247SMauro Carvalho Chehab 6852793e19dSMauro Carvalho Chehabto the guest kernel command line (see 6862793e19dSMauro Carvalho ChehabDocumentation/admin-guide/kernel-parameters.rst). 68757043247SMauro Carvalho Chehab 68857043247SMauro Carvalho Chehab 689118b1366SLaurent Dufournmi_wd_lpm_factor (PPC only) 690118b1366SLaurent Dufour============================ 691118b1366SLaurent Dufour 692118b1366SLaurent DufourFactor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is 693118b1366SLaurent Dufourset to 1). This factor represents the percentage added to 694118b1366SLaurent Dufour``watchdog_thresh`` when calculating the NMI watchdog timeout during an 695118b1366SLaurent DufourLPM. The soft lockup timeout is not impacted. 696118b1366SLaurent Dufour 697118b1366SLaurent DufourA value of 0 means no change. The default value is 200 meaning the NMI 698118b1366SLaurent Dufourwatchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). 699118b1366SLaurent Dufour 700118b1366SLaurent Dufour 701a3cb66a5SStephen Kittnuma_balancing 702a3cb66a5SStephen Kitt============== 70357043247SMauro Carvalho Chehab 704c574bbe9SHuang YingEnables/disables and configures automatic page fault based NUMA memory 705c574bbe9SHuang Yingbalancing. Memory is moved automatically to nodes that access it often. 706c574bbe9SHuang YingThe value to set can be the result of ORing the following: 70757043247SMauro Carvalho Chehab 708c574bbe9SHuang Ying= ================================= 709c574bbe9SHuang Ying0 NUMA_BALANCING_DISABLED 710c574bbe9SHuang Ying1 NUMA_BALANCING_NORMAL 711c574bbe9SHuang Ying2 NUMA_BALANCING_MEMORY_TIERING 712c574bbe9SHuang Ying= ================================= 713c574bbe9SHuang Ying 714c574bbe9SHuang YingOr NUMA_BALANCING_NORMAL to optimize page placement among different 715c574bbe9SHuang YingNUMA nodes to reduce remote accessing. On NUMA machines, there is a 716c574bbe9SHuang Yingperformance penalty if remote memory is accessed by a CPU. When this 717c574bbe9SHuang Yingfeature is enabled the kernel samples what task thread is accessing 718c574bbe9SHuang Yingmemory by periodically unmapping pages and later trapping a page 719c574bbe9SHuang Yingfault. At the time of the page fault, it is determined if the data 720c574bbe9SHuang Yingbeing accessed should be migrated to a local memory node. 72157043247SMauro Carvalho Chehab 72257043247SMauro Carvalho ChehabThe unmapping of pages and trapping faults incur additional overhead that 72357043247SMauro Carvalho Chehabideally is offset by improved memory locality but there is no universal 72457043247SMauro Carvalho Chehabguarantee. If the target workload is already bound to NUMA nodes then this 7253624ba7bSHuang Yingfeature should be disabled. 72657043247SMauro Carvalho Chehab 727c574bbe9SHuang YingOr NUMA_BALANCING_MEMORY_TIERING to optimize page placement among 728c574bbe9SHuang Yingdifferent types of memory (represented as different NUMA nodes) to 729c574bbe9SHuang Yingplace the hot pages in the fast memory. This is implemented based on 730c574bbe9SHuang Yingunmapping and page fault too. 73157043247SMauro Carvalho Chehab 732c6833e10SHuang Yingnuma_balancing_promote_rate_limit_MBps 733c6833e10SHuang Ying====================================== 734c6833e10SHuang Ying 735c6833e10SHuang YingToo high promotion/demotion throughput between different memory types 736c6833e10SHuang Yingmay hurt application latency. This can be used to rate limit the 737c6833e10SHuang Yingpromotion throughput. The per-node max promotion throughput in MB/s 738c6833e10SHuang Yingwill be limited to be no more than the set value. 739c6833e10SHuang Ying 740c6833e10SHuang YingA rule of thumb is to set this to less than 1/10 of the PMEM node 741c6833e10SHuang Yingwrite bandwidth. 742c6833e10SHuang Ying 743e996919bSRandy Dunlapoops_all_cpu_backtrace 744e996919bSRandy Dunlap====================== 74560c958d8SGuilherme G. Piccoli 74660c958d8SGuilherme G. PiccoliIf this option is set, the kernel will send an NMI to all CPUs to dump 74760c958d8SGuilherme G. Piccolitheir backtraces when an oops event occurs. It should be used as a last 74860c958d8SGuilherme G. Piccoliresort in case a panic cannot be triggered (to protect VMs running, for 74960c958d8SGuilherme G. Piccoliexample) or kdump can't be collected. This file shows up if CONFIG_SMP 75060c958d8SGuilherme G. Piccoliis enabled. 75160c958d8SGuilherme G. Piccoli 75260c958d8SGuilherme G. Piccoli0: Won't show all CPUs backtraces when an oops is detected. 75360c958d8SGuilherme G. PiccoliThis is the default behavior. 75460c958d8SGuilherme G. Piccoli 75560c958d8SGuilherme G. Piccoli1: Will non-maskably interrupt all CPUs and dump their backtraces when 75660c958d8SGuilherme G. Piccolian oops event is detected. 75760c958d8SGuilherme G. Piccoli 75860c958d8SGuilherme G. Piccoli 759d4ccd54dSJann Hornoops_limit 760d4ccd54dSJann Horn========== 761d4ccd54dSJann Horn 762d4ccd54dSJann HornNumber of kernel oopses after which the kernel should panic when 763de92f657SKees Cook``panic_on_oops`` is not set. Setting this to 0 disables checking 764de92f657SKees Cookthe count. Setting this to 1 has the same effect as setting 765de92f657SKees Cook``panic_on_oops=1``. The default value is 10000. 766d4ccd54dSJann Horn 767d4ccd54dSJann Horn 768a3cb66a5SStephen Kittosrelease, ostype & version 769a3cb66a5SStephen Kitt=========================== 77057043247SMauro Carvalho Chehab 77157043247SMauro Carvalho Chehab:: 77257043247SMauro Carvalho Chehab 77357043247SMauro Carvalho Chehab # cat osrelease 77457043247SMauro Carvalho Chehab 2.1.88 77557043247SMauro Carvalho Chehab # cat ostype 77657043247SMauro Carvalho Chehab Linux 77757043247SMauro Carvalho Chehab # cat version 77857043247SMauro Carvalho Chehab #5 Wed Feb 25 21:49:24 MET 1998 77957043247SMauro Carvalho Chehab 780a3cb66a5SStephen KittThe files ``osrelease`` and ``ostype`` should be clear enough. 781a3cb66a5SStephen Kitt``version`` 78257043247SMauro Carvalho Chehabneeds a little more clarification however. The '#5' means that 78357043247SMauro Carvalho Chehabthis is the fifth kernel built from this source base and the 78457043247SMauro Carvalho Chehabdate behind it indicates the time the kernel was built. 78557043247SMauro Carvalho ChehabThe only way to tune these values is to rebuild the kernel :-) 78657043247SMauro Carvalho Chehab 78757043247SMauro Carvalho Chehab 788a3cb66a5SStephen Kittoverflowgid & overflowuid 789a3cb66a5SStephen Kitt========================= 79057043247SMauro Carvalho Chehab 79157043247SMauro Carvalho Chehabif your architecture did not always support 32-bit UIDs (i.e. arm, 79257043247SMauro Carvalho Chehabi386, m68k, sh, and sparc32), a fixed UID and GID will be returned to 79357043247SMauro Carvalho Chehabapplications that use the old 16-bit UID/GID system calls, if the 79457043247SMauro Carvalho Chehabactual UID or GID would exceed 65535. 79557043247SMauro Carvalho Chehab 79657043247SMauro Carvalho ChehabThese sysctls allow you to change the value of the fixed UID and GID. 79757043247SMauro Carvalho ChehabThe default is 65534. 79857043247SMauro Carvalho Chehab 79957043247SMauro Carvalho Chehab 800a3cb66a5SStephen Kittpanic 801a3cb66a5SStephen Kitt===== 80257043247SMauro Carvalho Chehab 803404347e6SStephen KittThe value in this file determines the behaviour of the kernel on a 804404347e6SStephen Kittpanic: 805404347e6SStephen Kitt 806404347e6SStephen Kitt* if zero, the kernel will loop forever; 807404347e6SStephen Kitt* if negative, the kernel will reboot immediately; 808404347e6SStephen Kitt* if positive, the kernel will reboot after the corresponding number 809404347e6SStephen Kitt of seconds. 810404347e6SStephen Kitt 811404347e6SStephen KittWhen you use the software watchdog, the recommended setting is 60. 81257043247SMauro Carvalho Chehab 81357043247SMauro Carvalho Chehab 814a3cb66a5SStephen Kittpanic_on_io_nmi 815a3cb66a5SStephen Kitt=============== 81657043247SMauro Carvalho Chehab 81757043247SMauro Carvalho ChehabControls the kernel's behavior when a CPU receives an NMI caused by 81857043247SMauro Carvalho Chehaban IO error. 81957043247SMauro Carvalho Chehab 820a3cb66a5SStephen Kitt= ================================================================== 821a3cb66a5SStephen Kitt0 Try to continue operation (default). 822a3cb66a5SStephen Kitt1 Panic immediately. The IO error triggered an NMI. This indicates a 82357043247SMauro Carvalho Chehab serious system condition which could result in IO data corruption. 82457043247SMauro Carvalho Chehab Rather than continuing, panicking might be a better choice. Some 82557043247SMauro Carvalho Chehab servers issue this sort of NMI when the dump button is pushed, 82657043247SMauro Carvalho Chehab and you can use this option to take a crash dump. 827a3cb66a5SStephen Kitt= ================================================================== 82857043247SMauro Carvalho Chehab 82957043247SMauro Carvalho Chehab 830a3cb66a5SStephen Kittpanic_on_oops 831a3cb66a5SStephen Kitt============= 83257043247SMauro Carvalho Chehab 83357043247SMauro Carvalho ChehabControls the kernel's behaviour when an oops or BUG is encountered. 83457043247SMauro Carvalho Chehab 835a3cb66a5SStephen Kitt= =================================================================== 836a3cb66a5SStephen Kitt0 Try to continue operation. 837a3cb66a5SStephen Kitt1 Panic immediately. If the `panic` sysctl is also non-zero then the 83857043247SMauro Carvalho Chehab machine will be rebooted. 839a3cb66a5SStephen Kitt= =================================================================== 84057043247SMauro Carvalho Chehab 84157043247SMauro Carvalho Chehab 842a3cb66a5SStephen Kittpanic_on_stackoverflow 843a3cb66a5SStephen Kitt====================== 84457043247SMauro Carvalho Chehab 84557043247SMauro Carvalho ChehabControls the kernel's behavior when detecting the overflows of 84657043247SMauro Carvalho Chehabkernel, IRQ and exception stacks except a user stack. 847a3cb66a5SStephen KittThis file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled. 84857043247SMauro Carvalho Chehab 849a3cb66a5SStephen Kitt= ========================== 850a3cb66a5SStephen Kitt0 Try to continue operation. 851a3cb66a5SStephen Kitt1 Panic immediately. 852a3cb66a5SStephen Kitt= ========================== 85357043247SMauro Carvalho Chehab 85457043247SMauro Carvalho Chehab 855a3cb66a5SStephen Kittpanic_on_unrecovered_nmi 856a3cb66a5SStephen Kitt======================== 85757043247SMauro Carvalho Chehab 85857043247SMauro Carvalho ChehabThe default Linux behaviour on an NMI of either memory or unknown is 85957043247SMauro Carvalho Chehabto continue operation. For many environments such as scientific 86057043247SMauro Carvalho Chehabcomputing it is preferable that the box is taken out and the error 86157043247SMauro Carvalho Chehabdealt with than an uncorrected parity/ECC error get propagated. 86257043247SMauro Carvalho Chehab 863a3cb66a5SStephen KittA small number of systems do generate NMIs for bizarre random reasons 86457043247SMauro Carvalho Chehabsuch as power management so the default is off. That sysctl works like 86557043247SMauro Carvalho Chehabthe existing panic controls already in that directory. 86657043247SMauro Carvalho Chehab 86757043247SMauro Carvalho Chehab 868a3cb66a5SStephen Kittpanic_on_warn 869a3cb66a5SStephen Kitt============= 87057043247SMauro Carvalho Chehab 87157043247SMauro Carvalho ChehabCalls panic() in the WARN() path when set to 1. This is useful to avoid 87257043247SMauro Carvalho Chehaba kernel rebuild when attempting to kdump at the location of a WARN(). 87357043247SMauro Carvalho Chehab 874a3cb66a5SStephen Kitt= ================================================ 875a3cb66a5SStephen Kitt0 Only WARN(), default behaviour. 876a3cb66a5SStephen Kitt1 Call panic() after printing out WARN() location. 877a3cb66a5SStephen Kitt= ================================================ 87857043247SMauro Carvalho Chehab 87957043247SMauro Carvalho Chehab 880a3cb66a5SStephen Kittpanic_print 881a3cb66a5SStephen Kitt=========== 88257043247SMauro Carvalho Chehab 88357043247SMauro Carvalho ChehabBitmask for printing system info when panic happens. User can chose 88457043247SMauro Carvalho Chehabcombination of the following bits: 88557043247SMauro Carvalho Chehab 886a3cb66a5SStephen Kitt===== ============================================ 88757043247SMauro Carvalho Chehabbit 0 print all tasks info 88857043247SMauro Carvalho Chehabbit 1 print system memory info 88957043247SMauro Carvalho Chehabbit 2 print timer info 890a3cb66a5SStephen Kittbit 3 print locks info if ``CONFIG_LOCKDEP`` is on 89157043247SMauro Carvalho Chehabbit 4 print ftrace buffer 892a1ff1de0SGuilherme G. Piccolibit 5 print all printk messages in buffer 8938d470a45SGuilherme G. Piccolibit 6 print all CPUs backtrace (if available in the arch) 8942e3fc6caSFeng Tangbit 7 print only tasks in uninterruptible (blocked) state 895a3cb66a5SStephen Kitt===== ============================================ 89657043247SMauro Carvalho Chehab 89757043247SMauro Carvalho ChehabSo for example to print tasks and memory info on panic, user can:: 89857043247SMauro Carvalho Chehab 89957043247SMauro Carvalho Chehab echo 3 > /proc/sys/kernel/panic_print 90057043247SMauro Carvalho Chehab 90157043247SMauro Carvalho Chehab 902a3cb66a5SStephen Kittpanic_on_rcu_stall 903a3cb66a5SStephen Kitt================== 90457043247SMauro Carvalho Chehab 90557043247SMauro Carvalho ChehabWhen set to 1, calls panic() after RCU stall detection messages. This 90657043247SMauro Carvalho Chehabis useful to define the root cause of RCU stalls using a vmcore. 90757043247SMauro Carvalho Chehab 908a3cb66a5SStephen Kitt= ============================================================ 909a3cb66a5SStephen Kitt0 Do not panic() when RCU stall takes place, default behavior. 910a3cb66a5SStephen Kitt1 panic() after printing RCU stall messages. 911a3cb66a5SStephen Kitt= ============================================================ 91257043247SMauro Carvalho Chehab 91381c65365SJoel Savitzmax_rcu_stall_to_panic 91481c65365SJoel Savitz====================== 91581c65365SJoel Savitz 91681c65365SJoel SavitzWhen ``panic_on_rcu_stall`` is set to 1, this value determines the 91781c65365SJoel Savitznumber of times that RCU can stall before panic() is called. 91881c65365SJoel Savitz 91981c65365SJoel SavitzWhen ``panic_on_rcu_stall`` is set to 0, this value is has no effect. 92057043247SMauro Carvalho Chehab 921a3cb66a5SStephen Kittperf_cpu_time_max_percent 922a3cb66a5SStephen Kitt========================= 92357043247SMauro Carvalho Chehab 92457043247SMauro Carvalho ChehabHints to the kernel how much CPU time it should be allowed to 92557043247SMauro Carvalho Chehabuse to handle perf sampling events. If the perf subsystem 92657043247SMauro Carvalho Chehabis informed that its samples are exceeding this limit, it 92757043247SMauro Carvalho Chehabwill drop its sampling frequency to attempt to reduce its CPU 92857043247SMauro Carvalho Chehabusage. 92957043247SMauro Carvalho Chehab 93057043247SMauro Carvalho ChehabSome perf sampling happens in NMIs. If these samples 93157043247SMauro Carvalho Chehabunexpectedly take too long to execute, the NMIs can become 93257043247SMauro Carvalho Chehabstacked up next to each other so much that nothing else is 93357043247SMauro Carvalho Chehaballowed to execute. 93457043247SMauro Carvalho Chehab 935a3cb66a5SStephen Kitt===== ======================================================== 936a3cb66a5SStephen Kitt0 Disable the mechanism. Do not monitor or correct perf's 93757043247SMauro Carvalho Chehab sampling rate no matter how CPU time it takes. 93857043247SMauro Carvalho Chehab 939a3cb66a5SStephen Kitt1-100 Attempt to throttle perf's sample rate to this 94057043247SMauro Carvalho Chehab percentage of CPU. Note: the kernel calculates an 94157043247SMauro Carvalho Chehab "expected" length of each sample event. 100 here means 94257043247SMauro Carvalho Chehab 100% of that expected length. Even if this is set to 94357043247SMauro Carvalho Chehab 100, you may still see sample throttling if this 94457043247SMauro Carvalho Chehab length is exceeded. Set to 0 if you truly do not care 94557043247SMauro Carvalho Chehab how much CPU is consumed. 946a3cb66a5SStephen Kitt===== ======================================================== 94757043247SMauro Carvalho Chehab 94857043247SMauro Carvalho Chehab 949a3cb66a5SStephen Kittperf_event_paranoid 950a3cb66a5SStephen Kitt=================== 95157043247SMauro Carvalho Chehab 95257043247SMauro Carvalho ChehabControls use of the performance events system by unprivileged 953025b16f8SAlexey Budankovusers (without CAP_PERFMON). The default value is 2. 954025b16f8SAlexey Budankov 955025b16f8SAlexey BudankovFor backward compatibility reasons access to system performance 956025b16f8SAlexey Budankovmonitoring and observability remains open for CAP_SYS_ADMIN 957025b16f8SAlexey Budankovprivileged processes but CAP_SYS_ADMIN usage for secure system 958025b16f8SAlexey Budankovperformance monitoring and observability operations is discouraged 959025b16f8SAlexey Budankovwith respect to CAP_PERFMON use cases. 96057043247SMauro Carvalho Chehab 96157043247SMauro Carvalho Chehab=== ================================================================== 962a3cb66a5SStephen Kitt -1 Allow use of (almost) all events by all users. 96357043247SMauro Carvalho Chehab 964a3cb66a5SStephen Kitt Ignore mlock limit after perf_event_mlock_kb without 965a3cb66a5SStephen Kitt ``CAP_IPC_LOCK``. 96657043247SMauro Carvalho Chehab 967a3cb66a5SStephen Kitt>=0 Disallow ftrace function tracepoint by users without 968025b16f8SAlexey Budankov ``CAP_PERFMON``. 96957043247SMauro Carvalho Chehab 970025b16f8SAlexey Budankov Disallow raw tracepoint access by users without ``CAP_PERFMON``. 97157043247SMauro Carvalho Chehab 972025b16f8SAlexey Budankov>=1 Disallow CPU event access by users without ``CAP_PERFMON``. 97357043247SMauro Carvalho Chehab 974025b16f8SAlexey Budankov>=2 Disallow kernel profiling by users without ``CAP_PERFMON``. 97557043247SMauro Carvalho Chehab=== ================================================================== 97657043247SMauro Carvalho Chehab 97757043247SMauro Carvalho Chehab 978a3cb66a5SStephen Kittperf_event_max_stack 979a3cb66a5SStephen Kitt==================== 98057043247SMauro Carvalho Chehab 981a3cb66a5SStephen KittControls maximum number of stack frames to copy for (``attr.sample_type & 982a3cb66a5SStephen KittPERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using 983a3cb66a5SStephen Kitt'``perf record -g``' or '``perf trace --call-graph fp``'. 98457043247SMauro Carvalho Chehab 98557043247SMauro Carvalho ChehabThis can only be done when no events are in use that have callchains 986a3cb66a5SStephen Kittenabled, otherwise writing to this file will return ``-EBUSY``. 98757043247SMauro Carvalho Chehab 98857043247SMauro Carvalho ChehabThe default value is 127. 98957043247SMauro Carvalho Chehab 99057043247SMauro Carvalho Chehab 991a3cb66a5SStephen Kittperf_event_mlock_kb 992a3cb66a5SStephen Kitt=================== 99357043247SMauro Carvalho Chehab 994751d5b27SAndrew KlychkovControl size of per-cpu ring buffer not counted against mlock limit. 99557043247SMauro Carvalho Chehab 99657043247SMauro Carvalho ChehabThe default value is 512 + 1 page 99757043247SMauro Carvalho Chehab 99857043247SMauro Carvalho Chehab 999a3cb66a5SStephen Kittperf_event_max_contexts_per_stack 1000a3cb66a5SStephen Kitt================================= 100157043247SMauro Carvalho Chehab 100257043247SMauro Carvalho ChehabControls maximum number of stack frame context entries for 1003a3cb66a5SStephen Kitt(``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for 1004a3cb66a5SStephen Kittinstance, when using '``perf record -g``' or '``perf trace --call-graph fp``'. 100557043247SMauro Carvalho Chehab 100657043247SMauro Carvalho ChehabThis can only be done when no events are in use that have callchains 1007a3cb66a5SStephen Kittenabled, otherwise writing to this file will return ``-EBUSY``. 100857043247SMauro Carvalho Chehab 100957043247SMauro Carvalho ChehabThe default value is 8. 101057043247SMauro Carvalho Chehab 101157043247SMauro Carvalho Chehab 101257972127SAlexandre Ghitiperf_user_access (arm64 and riscv only) 101357972127SAlexandre Ghiti======================================= 1014e2012600SRob Herring 101557972127SAlexandre GhitiControls user space access for reading perf event counters. 101657972127SAlexandre Ghiti 101757972127SAlexandre Ghitiarm64 101857972127SAlexandre Ghiti===== 1019e2012600SRob Herring 1020e2012600SRob HerringThe default value is 0 (access disabled). 1021e2012600SRob Herring 102257972127SAlexandre GhitiWhen set to 1, user space can read performance monitor counter registers 102357972127SAlexandre Ghitidirectly. 102457972127SAlexandre Ghiti 1025e4624435SJonathan CorbetSee Documentation/arch/arm64/perf.rst for more information. 1026e2012600SRob Herring 102757972127SAlexandre Ghitiriscv 102857972127SAlexandre Ghiti===== 102957972127SAlexandre Ghiti 103057972127SAlexandre GhitiWhen set to 0, user space access is disabled. 103157972127SAlexandre Ghiti 103257972127SAlexandre GhitiThe default value is 1, user space can read performance monitor counter 103357972127SAlexandre Ghitiregisters through perf, any direct access without perf intervention will trigger 103457972127SAlexandre Ghitian illegal instruction. 103557972127SAlexandre Ghiti 103657972127SAlexandre GhitiWhen set to 2, which enables legacy mode (user space has direct access to cycle 103757972127SAlexandre Ghitiand insret CSRs only). Note that this legacy value is deprecated and will be 103857972127SAlexandre Ghitiremoved once all user space applications are fixed. 103957972127SAlexandre Ghiti 104057972127SAlexandre GhitiNote that the time CSR is always directly accessible to all modes. 1041e2012600SRob Herring 1042a3cb66a5SStephen Kittpid_max 1043a3cb66a5SStephen Kitt======= 104457043247SMauro Carvalho Chehab 104557043247SMauro Carvalho ChehabPID allocation wrap value. When the kernel's next PID value 104657043247SMauro Carvalho Chehabreaches this value, it wraps back to a minimum PID value. 1047a3cb66a5SStephen KittPIDs of value ``pid_max`` or larger are not allocated. 104857043247SMauro Carvalho Chehab 104957043247SMauro Carvalho Chehab 1050a3cb66a5SStephen Kittns_last_pid 1051a3cb66a5SStephen Kitt=========== 105257043247SMauro Carvalho Chehab 105357043247SMauro Carvalho ChehabThe last pid allocated in the current (the one task using this sysctl 105457043247SMauro Carvalho Chehablives in) pid namespace. When selecting a pid for a next task on fork 105557043247SMauro Carvalho Chehabkernel tries to allocate a number starting from this one. 105657043247SMauro Carvalho Chehab 105757043247SMauro Carvalho Chehab 1058a3cb66a5SStephen Kittpowersave-nap (PPC only) 1059a3cb66a5SStephen Kitt======================== 106057043247SMauro Carvalho Chehab 106157043247SMauro Carvalho ChehabIf set, Linux-PPC will use the 'nap' mode of powersaving, 106257043247SMauro Carvalho Chehabotherwise the 'doze' mode will be used. 106357043247SMauro Carvalho Chehab 1064a3cb66a5SStephen Kitt 106557043247SMauro Carvalho Chehab============================================================== 106657043247SMauro Carvalho Chehab 1067a3cb66a5SStephen Kittprintk 1068a3cb66a5SStephen Kitt====== 106957043247SMauro Carvalho Chehab 1070a3cb66a5SStephen KittThe four values in printk denote: ``console_loglevel``, 1071a3cb66a5SStephen Kitt``default_message_loglevel``, ``minimum_console_loglevel`` and 1072a3cb66a5SStephen Kitt``default_console_loglevel`` respectively. 107357043247SMauro Carvalho Chehab 107457043247SMauro Carvalho ChehabThese values influence printk() behavior when printing or 1075a3cb66a5SStephen Kittlogging error messages. See '``man 2 syslog``' for more info on 107657043247SMauro Carvalho Chehabthe different loglevels. 107757043247SMauro Carvalho Chehab 1078a3cb66a5SStephen Kitt======================== ===================================== 1079a3cb66a5SStephen Kittconsole_loglevel messages with a higher priority than 108057043247SMauro Carvalho Chehab this will be printed to the console 1081a3cb66a5SStephen Kittdefault_message_loglevel messages without an explicit priority 108257043247SMauro Carvalho Chehab will be printed with this priority 1083a3cb66a5SStephen Kittminimum_console_loglevel minimum (highest) value to which 108457043247SMauro Carvalho Chehab console_loglevel can be set 1085a3cb66a5SStephen Kittdefault_console_loglevel default value for console_loglevel 1086a3cb66a5SStephen Kitt======================== ===================================== 108757043247SMauro Carvalho Chehab 108857043247SMauro Carvalho Chehab 1089a3cb66a5SStephen Kittprintk_delay 1090a3cb66a5SStephen Kitt============ 109157043247SMauro Carvalho Chehab 1092a3cb66a5SStephen KittDelay each printk message in ``printk_delay`` milliseconds 109357043247SMauro Carvalho Chehab 109457043247SMauro Carvalho ChehabValue from 0 - 10000 is allowed. 109557043247SMauro Carvalho Chehab 109657043247SMauro Carvalho Chehab 1097a3cb66a5SStephen Kittprintk_ratelimit 1098a3cb66a5SStephen Kitt================ 109957043247SMauro Carvalho Chehab 1100a3cb66a5SStephen KittSome warning messages are rate limited. ``printk_ratelimit`` specifies 1101ca30ad85SOleksandr Natalenkothe minimum length of time between these messages (in seconds). 1102ca30ad85SOleksandr NatalenkoThe default value is 5 seconds. 110357043247SMauro Carvalho Chehab 110457043247SMauro Carvalho ChehabA value of 0 will disable rate limiting. 110557043247SMauro Carvalho Chehab 110657043247SMauro Carvalho Chehab 1107a3cb66a5SStephen Kittprintk_ratelimit_burst 1108a3cb66a5SStephen Kitt====================== 110957043247SMauro Carvalho Chehab 1110a3cb66a5SStephen KittWhile long term we enforce one message per `printk_ratelimit`_ 111157043247SMauro Carvalho Chehabseconds, we do allow a burst of messages to pass through. 1112a3cb66a5SStephen Kitt``printk_ratelimit_burst`` specifies the number of messages we can 111357043247SMauro Carvalho Chehabsend before ratelimiting kicks in. 111457043247SMauro Carvalho Chehab 1115ca30ad85SOleksandr NatalenkoThe default value is 10 messages. 1116ca30ad85SOleksandr Natalenko 111757043247SMauro Carvalho Chehab 1118a3cb66a5SStephen Kittprintk_devkmsg 1119a3cb66a5SStephen Kitt============== 112057043247SMauro Carvalho Chehab 1121a3cb66a5SStephen KittControl the logging to ``/dev/kmsg`` from userspace: 112257043247SMauro Carvalho Chehab 1123a3cb66a5SStephen Kitt========= ============================================= 1124a3cb66a5SStephen Kittratelimit default, ratelimited 1125a3cb66a5SStephen Kitton unlimited logging to /dev/kmsg from userspace 1126a3cb66a5SStephen Kittoff logging to /dev/kmsg disabled 1127a3cb66a5SStephen Kitt========= ============================================= 112857043247SMauro Carvalho Chehab 1129a3cb66a5SStephen KittThe kernel command line parameter ``printk.devkmsg=`` overrides this and is 113057043247SMauro Carvalho Chehaba one-time setting until next reboot: once set, it cannot be changed by 113157043247SMauro Carvalho Chehabthis sysctl interface anymore. 113257043247SMauro Carvalho Chehab 1133a3cb66a5SStephen Kitt============================================================== 113457043247SMauro Carvalho Chehab 1135a3cb66a5SStephen Kitt 1136a3cb66a5SStephen Kittpty 1137a3cb66a5SStephen Kitt=== 1138a3cb66a5SStephen Kitt 113901478b83SMauro Carvalho ChehabSee Documentation/filesystems/devpts.rst. 1140a3cb66a5SStephen Kitt 1141a3cb66a5SStephen Kitt 11420b227076SStephen Kittrandom 11430b227076SStephen Kitt====== 11440b227076SStephen Kitt 11450b227076SStephen KittThis is a directory, with the following entries: 11460b227076SStephen Kitt 11470b227076SStephen Kitt* ``boot_id``: a UUID generated the first time this is retrieved, and 11480b227076SStephen Kitt unvarying after that; 11490b227076SStephen Kitt 1150069c4ea6SJason A. Donenfeld* ``uuid``: a UUID generated every time this is retrieved (this can 1151069c4ea6SJason A. Donenfeld thus be used to generate UUIDs at will); 1152069c4ea6SJason A. Donenfeld 11530b227076SStephen Kitt* ``entropy_avail``: the pool's entropy count, in bits; 11540b227076SStephen Kitt 11550b227076SStephen Kitt* ``poolsize``: the entropy pool size, in bits; 11560b227076SStephen Kitt 11570b227076SStephen Kitt* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum 1158489c7fc4SJason A. Donenfeld number of seconds between urandom pool reseeding). This file is 1159489c7fc4SJason A. Donenfeld writable for compatibility purposes, but writing to it has no effect 1160069c4ea6SJason A. Donenfeld on any RNG behavior; 11610b227076SStephen Kitt 11620b227076SStephen Kitt* ``write_wakeup_threshold``: when the entropy count drops below this 11630b227076SStephen Kitt (as a number of bits), processes waiting to write to ``/dev/random`` 1164489c7fc4SJason A. Donenfeld are woken up. This file is writable for compatibility purposes, but 1165489c7fc4SJason A. Donenfeld writing to it has no effect on any RNG behavior. 11660b227076SStephen Kitt 11670b227076SStephen Kitt 1168a3cb66a5SStephen Kittrandomize_va_space 1169a3cb66a5SStephen Kitt================== 117057043247SMauro Carvalho Chehab 117157043247SMauro Carvalho ChehabThis option can be used to select the type of process address 117257043247SMauro Carvalho Chehabspace randomization that is used in the system, for architectures 117357043247SMauro Carvalho Chehabthat support this feature. 117457043247SMauro Carvalho Chehab 117557043247SMauro Carvalho Chehab== =========================================================================== 117657043247SMauro Carvalho Chehab0 Turn the process address space randomization off. This is the 117757043247SMauro Carvalho Chehab default for architectures that do not support this feature anyways, 117857043247SMauro Carvalho Chehab and kernels that are booted with the "norandmaps" parameter. 117957043247SMauro Carvalho Chehab 118057043247SMauro Carvalho Chehab1 Make the addresses of mmap base, stack and VDSO page randomized. 118157043247SMauro Carvalho Chehab This, among other things, implies that shared libraries will be 118257043247SMauro Carvalho Chehab loaded to random addresses. Also for PIE-linked binaries, the 118357043247SMauro Carvalho Chehab location of code start is randomized. This is the default if the 1184a3cb66a5SStephen Kitt ``CONFIG_COMPAT_BRK`` option is enabled. 118557043247SMauro Carvalho Chehab 118657043247SMauro Carvalho Chehab2 Additionally enable heap randomization. This is the default if 1187a3cb66a5SStephen Kitt ``CONFIG_COMPAT_BRK`` is disabled. 118857043247SMauro Carvalho Chehab 118957043247SMauro Carvalho Chehab There are a few legacy applications out there (such as some ancient 119057043247SMauro Carvalho Chehab versions of libc.so.5 from 1996) that assume that brk area starts 119157043247SMauro Carvalho Chehab just after the end of the code+bss. These applications break when 119257043247SMauro Carvalho Chehab start of the brk area is randomized. There are however no known 119357043247SMauro Carvalho Chehab non-legacy applications that would be broken this way, so for most 119457043247SMauro Carvalho Chehab systems it is safe to choose full randomization. 119557043247SMauro Carvalho Chehab 119657043247SMauro Carvalho Chehab Systems with ancient and/or broken binaries should be configured 1197a3cb66a5SStephen Kitt with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process 119857043247SMauro Carvalho Chehab address space randomization. 119957043247SMauro Carvalho Chehab== =========================================================================== 120057043247SMauro Carvalho Chehab 120157043247SMauro Carvalho Chehab 1202a3cb66a5SStephen Kittreal-root-dev 1203a3cb66a5SStephen Kitt============= 1204a3cb66a5SStephen Kitt 12052793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/initrd.rst. 1206a3cb66a5SStephen Kitt 1207a3cb66a5SStephen Kitt 1208a3cb66a5SStephen Kittreboot-cmd (SPARC only) 1209a3cb66a5SStephen Kitt======================= 121057043247SMauro Carvalho Chehab 121157043247SMauro Carvalho Chehab??? This seems to be a way to give an argument to the Sparc 121257043247SMauro Carvalho ChehabROM/Flash boot loader. Maybe to tell it what to do after 121357043247SMauro Carvalho Chehabrebooting. ??? 121457043247SMauro Carvalho Chehab 121557043247SMauro Carvalho Chehab 1216a3cb66a5SStephen Kittsched_energy_aware 1217a3cb66a5SStephen Kitt================== 121857043247SMauro Carvalho Chehab 121957043247SMauro Carvalho ChehabEnables/disables Energy Aware Scheduling (EAS). EAS starts 122057043247SMauro Carvalho Chehabautomatically on platforms where it can run (that is, 122157043247SMauro Carvalho Chehabplatforms with asymmetric CPU topologies and having an Energy 122257043247SMauro Carvalho ChehabModel available). If your platform happens to meet the 122357043247SMauro Carvalho Chehabrequirements for EAS but you do not want to use it, change 12248f833c82SShrikanth Hegdethis value to 0. On Non-EAS platforms, write operation fails and 12258f833c82SShrikanth Hegderead doesn't return anything. 122657043247SMauro Carvalho Chehab 1227fcb50170SMel Gormantask_delayacct 1228fcb50170SMel Gorman=============== 1229fcb50170SMel Gorman 1230fcb50170SMel GormanEnables/disables task delay accounting (see 12310f60a29cSMauro Carvalho ChehabDocumentation/accounting/delay-accounting.rst. Enabling this feature incurs 1232fcb50170SMel Gormana small amount of overhead in the scheduler but is useful for debugging 1233fcb50170SMel Gormanand performance tuning. It is required by some tools such as iotop. 123457043247SMauro Carvalho Chehab 1235a3cb66a5SStephen Kittsched_schedstats 1236a3cb66a5SStephen Kitt================ 123757043247SMauro Carvalho Chehab 123857043247SMauro Carvalho ChehabEnables/disables scheduler statistics. Enabling this feature 123957043247SMauro Carvalho Chehabincurs a small amount of overhead in the scheduler but is 124057043247SMauro Carvalho Chehabuseful for debugging and performance tuning. 124157043247SMauro Carvalho Chehab 1242d151a23dSStephen Kittsched_util_clamp_min 1243d151a23dSStephen Kitt==================== 12441f73d1abSQais Yousef 12451f73d1abSQais YousefMax allowed *minimum* utilization. 12461f73d1abSQais Yousef 12471f73d1abSQais YousefDefault value is 1024, which is the maximum possible value. 12481f73d1abSQais Yousef 12491f73d1abSQais YousefIt means that any requested uclamp.min value cannot be greater than 12501f73d1abSQais Yousefsched_util_clamp_min, i.e., it is restricted to the range 12511f73d1abSQais Yousef[0:sched_util_clamp_min]. 12521f73d1abSQais Yousef 1253d151a23dSStephen Kittsched_util_clamp_max 1254d151a23dSStephen Kitt==================== 12551f73d1abSQais Yousef 12561f73d1abSQais YousefMax allowed *maximum* utilization. 12571f73d1abSQais Yousef 12581f73d1abSQais YousefDefault value is 1024, which is the maximum possible value. 12591f73d1abSQais Yousef 12601f73d1abSQais YousefIt means that any requested uclamp.max value cannot be greater than 12611f73d1abSQais Yousefsched_util_clamp_max, i.e., it is restricted to the range 12621f73d1abSQais Yousef[0:sched_util_clamp_max]. 12631f73d1abSQais Yousef 1264d151a23dSStephen Kittsched_util_clamp_min_rt_default 1265d151a23dSStephen Kitt=============================== 12661f73d1abSQais Yousef 12671f73d1abSQais YousefBy default Linux is tuned for performance. Which means that RT tasks always run 12681f73d1abSQais Yousefat the highest frequency and most capable (highest capacity) CPU (in 12691f73d1abSQais Yousefheterogeneous systems). 12701f73d1abSQais Yousef 12711f73d1abSQais YousefUclamp achieves this by setting the requested uclamp.min of all RT tasks to 12721f73d1abSQais Yousef1024 by default, which effectively boosts the tasks to run at the highest 12731f73d1abSQais Youseffrequency and biases them to run on the biggest CPU. 12741f73d1abSQais Yousef 12751f73d1abSQais YousefThis knob allows admins to change the default behavior when uclamp is being 12761f73d1abSQais Yousefused. In battery powered devices particularly, running at the maximum 12771f73d1abSQais Yousefcapacity and frequency will increase energy consumption and shorten the battery 12781f73d1abSQais Youseflife. 12791f73d1abSQais Yousef 12801f73d1abSQais YousefThis knob is only effective for RT tasks which the user hasn't modified their 12811f73d1abSQais Yousefrequested uclamp.min value via sched_setattr() syscall. 12821f73d1abSQais Yousef 12831f73d1abSQais YousefThis knob will not escape the range constraint imposed by sched_util_clamp_min 12841f73d1abSQais Yousefdefined above. 12851f73d1abSQais Yousef 12861f73d1abSQais YousefFor example if 12871f73d1abSQais Yousef 12881f73d1abSQais Yousef sched_util_clamp_min_rt_default = 800 12891f73d1abSQais Yousef sched_util_clamp_min = 600 12901f73d1abSQais Yousef 12911f73d1abSQais YousefThen the boost will be clamped to 600 because 800 is outside of the permissible 12921f73d1abSQais Yousefrange of [0:600]. This could happen for instance if a powersave mode will 12931f73d1abSQais Yousefrestrict all boosts temporarily by modifying sched_util_clamp_min. As soon as 12941f73d1abSQais Yousefthis restriction is lifted, the requested sched_util_clamp_min_rt_default 12951f73d1abSQais Yousefwill take effect. 129657043247SMauro Carvalho Chehab 1297a3cb66a5SStephen Kittseccomp 1298a3cb66a5SStephen Kitt======= 1299a3cb66a5SStephen Kitt 13002793e19dSMauro Carvalho ChehabSee Documentation/userspace-api/seccomp_filter.rst. 1301a3cb66a5SStephen Kitt 1302a3cb66a5SStephen Kitt 1303a3cb66a5SStephen Kittsg-big-buff 1304a3cb66a5SStephen Kitt=========== 130557043247SMauro Carvalho Chehab 130657043247SMauro Carvalho ChehabThis file shows the size of the generic SCSI (sg) buffer. 130757043247SMauro Carvalho ChehabYou can't tune it just yet, but you could change it on 1308a3cb66a5SStephen Kittcompile time by editing ``include/scsi/sg.h`` and changing 1309a3cb66a5SStephen Kittthe value of ``SG_BIG_BUFF``. 131057043247SMauro Carvalho Chehab 131157043247SMauro Carvalho ChehabThere shouldn't be any reason to change this value. If 131257043247SMauro Carvalho Chehabyou can come up with one, you probably know what you 131357043247SMauro Carvalho Chehabare doing anyway :) 131457043247SMauro Carvalho Chehab 131557043247SMauro Carvalho Chehab 1316a3cb66a5SStephen Kittshmall 1317a3cb66a5SStephen Kitt====== 131857043247SMauro Carvalho Chehab 13199220066eSAlexey GladkovThis parameter sets the total amount of shared memory pages that can be used 13209220066eSAlexey Gladkovinside ipc namespace. The shared memory pages counting occurs for each ipc 13219220066eSAlexey Gladkovnamespace separately and is not inherited. Hence, ``shmall`` should always be at 13229220066eSAlexey Gladkovleast ``ceil(shmmax/PAGE_SIZE)``. 132357043247SMauro Carvalho Chehab 1324a3cb66a5SStephen KittIf you are not sure what the default ``PAGE_SIZE`` is on your Linux 1325a3cb66a5SStephen Kittsystem, you can run the following command:: 132657043247SMauro Carvalho Chehab 132757043247SMauro Carvalho Chehab # getconf PAGE_SIZE 132857043247SMauro Carvalho Chehab 13299220066eSAlexey GladkovTo reduce or disable the ability to allocate shared memory, you must create a 13309220066eSAlexey Gladkovnew ipc namespace, set this parameter to the required value and prohibit the 13319220066eSAlexey Gladkovcreation of a new ipc namespace in the current user namespace or cgroups can 13329220066eSAlexey Gladkovbe used. 133357043247SMauro Carvalho Chehab 1334a3cb66a5SStephen Kittshmmax 1335a3cb66a5SStephen Kitt====== 133657043247SMauro Carvalho Chehab 133757043247SMauro Carvalho ChehabThis value can be used to query and set the run time limit 133857043247SMauro Carvalho Chehabon the maximum shared memory segment size that can be created. 133957043247SMauro Carvalho ChehabShared memory segments up to 1Gb are now supported in the 1340a3cb66a5SStephen Kittkernel. This value defaults to ``SHMMAX``. 134157043247SMauro Carvalho Chehab 134257043247SMauro Carvalho Chehab 1343a3cb66a5SStephen Kittshmmni 1344a3cb66a5SStephen Kitt====== 1345a3cb66a5SStephen Kitt 1346fa5b5264SStephen KittThis value determines the maximum number of shared memory segments. 1347fa5b5264SStephen Kitt4096 by default (``SHMMNI``). 1348fa5b5264SStephen Kitt 1349a3cb66a5SStephen Kitt 1350a3cb66a5SStephen Kittshm_rmid_forced 1351a3cb66a5SStephen Kitt=============== 135257043247SMauro Carvalho Chehab 135357043247SMauro Carvalho ChehabLinux lets you set resource limits, including how much memory one 1354a3cb66a5SStephen Kittprocess can consume, via ``setrlimit(2)``. Unfortunately, shared memory 135557043247SMauro Carvalho Chehabsegments are allowed to exist without association with any process, and 135657043247SMauro Carvalho Chehabthus might not be counted against any resource limits. If enabled, 135757043247SMauro Carvalho Chehabshared memory segments are automatically destroyed when their attach 135857043247SMauro Carvalho Chehabcount becomes zero after a detach or a process termination. It will 135957043247SMauro Carvalho Chehabalso destroy segments that were created, but never attached to, on exit 1360a3cb66a5SStephen Kittfrom the process. The only use left for ``IPC_RMID`` is to immediately 136157043247SMauro Carvalho Chehabdestroy an unattached segment. Of course, this breaks the way things are 136257043247SMauro Carvalho Chehabdefined, so some applications might stop working. Note that this 136357043247SMauro Carvalho Chehabfeature will do you no good unless you also configure your resource 1364a3cb66a5SStephen Kittlimits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``). Most systems don't 136557043247SMauro Carvalho Chehabneed this. 136657043247SMauro Carvalho Chehab 136757043247SMauro Carvalho ChehabNote that if you change this from 0 to 1, already created segments 136857043247SMauro Carvalho Chehabwithout users and with a dead originative process will be destroyed. 136957043247SMauro Carvalho Chehab 137057043247SMauro Carvalho Chehab 1371a3cb66a5SStephen Kittsysctl_writes_strict 1372a3cb66a5SStephen Kitt==================== 137357043247SMauro Carvalho Chehab 137457043247SMauro Carvalho ChehabControl how file position affects the behavior of updating sysctl values 1375a3cb66a5SStephen Kittvia the ``/proc/sys`` interface: 137657043247SMauro Carvalho Chehab 137757043247SMauro Carvalho Chehab == ====================================================================== 137857043247SMauro Carvalho Chehab -1 Legacy per-write sysctl value handling, with no printk warnings. 137957043247SMauro Carvalho Chehab Each write syscall must fully contain the sysctl value to be 138057043247SMauro Carvalho Chehab written, and multiple writes on the same sysctl file descriptor 138157043247SMauro Carvalho Chehab will rewrite the sysctl value, regardless of file position. 138257043247SMauro Carvalho Chehab 0 Same behavior as above, but warn about processes that perform writes 138357043247SMauro Carvalho Chehab to a sysctl file descriptor when the file position is not 0. 138457043247SMauro Carvalho Chehab 1 (default) Respect file position when writing sysctl strings. Multiple 138557043247SMauro Carvalho Chehab writes will append to the sysctl value buffer. Anything past the max 138657043247SMauro Carvalho Chehab length of the sysctl value buffer will be ignored. Writes to numeric 138757043247SMauro Carvalho Chehab sysctl entries must always be at file position 0 and the value must 138857043247SMauro Carvalho Chehab be fully contained in the buffer sent in the write syscall. 138957043247SMauro Carvalho Chehab == ====================================================================== 139057043247SMauro Carvalho Chehab 139157043247SMauro Carvalho Chehab 1392a3cb66a5SStephen Kittsoftlockup_all_cpu_backtrace 1393a3cb66a5SStephen Kitt============================ 139457043247SMauro Carvalho Chehab 139557043247SMauro Carvalho ChehabThis value controls the soft lockup detector thread's behavior 139657043247SMauro Carvalho Chehabwhen a soft lockup condition is detected as to whether or not 139757043247SMauro Carvalho Chehabto gather further debug information. If enabled, each cpu will 139857043247SMauro Carvalho Chehabbe issued an NMI and instructed to capture stack trace. 139957043247SMauro Carvalho Chehab 140057043247SMauro Carvalho ChehabThis feature is only applicable for architectures which support 140157043247SMauro Carvalho ChehabNMI. 140257043247SMauro Carvalho Chehab 1403a3cb66a5SStephen Kitt= ============================================ 1404a3cb66a5SStephen Kitt0 Do nothing. This is the default behavior. 1405a3cb66a5SStephen Kitt1 On detection capture more debug information. 1406a3cb66a5SStephen Kitt= ============================================ 140757043247SMauro Carvalho Chehab 140857043247SMauro Carvalho Chehab 14090a07bef6SGuilherme G. Piccolisoftlockup_panic 14100a07bef6SGuilherme G. Piccoli================= 14110a07bef6SGuilherme G. Piccoli 14120a07bef6SGuilherme G. PiccoliThis parameter can be used to control whether the kernel panics 14130a07bef6SGuilherme G. Piccoliwhen a soft lockup is detected. 14140a07bef6SGuilherme G. Piccoli 14150a07bef6SGuilherme G. Piccoli= ============================================ 14160a07bef6SGuilherme G. Piccoli0 Don't panic on soft lockup. 14170a07bef6SGuilherme G. Piccoli1 Panic on soft lockup. 14180a07bef6SGuilherme G. Piccoli= ============================================ 14190a07bef6SGuilherme G. Piccoli 14200a07bef6SGuilherme G. PiccoliThis can also be set using the softlockup_panic kernel parameter. 14210a07bef6SGuilherme G. Piccoli 14220a07bef6SGuilherme G. Piccoli 1423a3cb66a5SStephen Kittsoft_watchdog 1424a3cb66a5SStephen Kitt============= 142557043247SMauro Carvalho Chehab 142657043247SMauro Carvalho ChehabThis parameter can be used to control the soft lockup detector. 142757043247SMauro Carvalho Chehab 1428a3cb66a5SStephen Kitt= ================================= 1429a3cb66a5SStephen Kitt0 Disable the soft lockup detector. 1430a3cb66a5SStephen Kitt1 Enable the soft lockup detector. 1431a3cb66a5SStephen Kitt= ================================= 143257043247SMauro Carvalho Chehab 143357043247SMauro Carvalho ChehabThe soft lockup detector monitors CPUs for threads that are hogging the CPUs 1434256f7a67SWang Qingwithout rescheduling voluntarily, and thus prevent the 'migration/N' threads 1435256f7a67SWang Qingfrom running, causing the watchdog work fail to execute. The mechanism depends 1436256f7a67SWang Qingon the CPUs ability to respond to timer interrupts which are needed for the 1437256f7a67SWang Qingwatchdog work to be queued by the watchdog timer function, otherwise the NMI 1438256f7a67SWang Qingwatchdog — if enabled — can detect a hard lockup condition. 143957043247SMauro Carvalho Chehab 144057043247SMauro Carvalho Chehab 144172720937SGuilherme G. Piccolisplit_lock_mitigate (x86 only) 144272720937SGuilherme G. Piccoli============================== 144372720937SGuilherme G. Piccoli 144472720937SGuilherme G. PiccoliOn x86, each "split lock" imposes a system-wide performance penalty. On larger 144572720937SGuilherme G. Piccolisystems, large numbers of split locks from unprivileged users can result in 144672720937SGuilherme G. Piccolidenials of service to well-behaved and potentially more important users. 144772720937SGuilherme G. Piccoli 144872720937SGuilherme G. PiccoliThe kernel mitigates these bad users by detecting split locks and imposing 144972720937SGuilherme G. Piccolipenalties: forcing them to wait and only allowing one core to execute split 145072720937SGuilherme G. Piccolilocks at a time. 145172720937SGuilherme G. Piccoli 145272720937SGuilherme G. PiccoliThese mitigations can make those bad applications unbearably slow. Setting 145372720937SGuilherme G. Piccolisplit_lock_mitigate=0 may restore some application performance, but will also 145472720937SGuilherme G. Piccoliincrease system exposure to denial of service attacks from split lock users. 145572720937SGuilherme G. Piccoli 145672720937SGuilherme G. Piccoli= =================================================================== 145772720937SGuilherme G. Piccoli0 Disable the mitigation mode - just warns the split lock on kernel log 145872720937SGuilherme G. Piccoli and exposes the system to denials of service from the split lockers. 145972720937SGuilherme G. Piccoli1 Enable the mitigation mode (this is the default) - penalizes the split 146072720937SGuilherme G. Piccoli lockers with intentional performance degradation. 146172720937SGuilherme G. Piccoli= =================================================================== 146272720937SGuilherme G. Piccoli 146372720937SGuilherme G. Piccoli 1464a3cb66a5SStephen Kittstack_erasing 1465a3cb66a5SStephen Kitt============= 146657043247SMauro Carvalho Chehab 146757043247SMauro Carvalho ChehabThis parameter can be used to control kernel stack erasing at the end 1468a3cb66a5SStephen Kittof syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``. 146957043247SMauro Carvalho Chehab 147057043247SMauro Carvalho ChehabThat erasing reduces the information which kernel stack leak bugs 147157043247SMauro Carvalho Chehabcan reveal and blocks some uninitialized stack variable attacks. 147257043247SMauro Carvalho ChehabThe tradeoff is the performance impact: on a single CPU system kernel 147357043247SMauro Carvalho Chehabcompilation sees a 1% slowdown, other systems and workloads may vary. 147457043247SMauro Carvalho Chehab 1475a3cb66a5SStephen Kitt= ==================================================================== 1476a3cb66a5SStephen Kitt0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated. 1477a3cb66a5SStephen Kitt1 Kernel stack erasing is enabled (default), it is performed before 147857043247SMauro Carvalho Chehab returning to the userspace at the end of syscalls. 1479a3cb66a5SStephen Kitt= ==================================================================== 1480a3cb66a5SStephen Kitt 1481a3cb66a5SStephen Kitt 1482a3cb66a5SStephen Kittstop-a (SPARC only) 1483a3cb66a5SStephen Kitt=================== 1484a3cb66a5SStephen Kitt 1485a1ad4f15SStephen KittControls Stop-A: 1486a1ad4f15SStephen Kitt 1487a1ad4f15SStephen Kitt= ==================================== 1488a1ad4f15SStephen Kitt0 Stop-A has no effect. 1489a1ad4f15SStephen Kitt1 Stop-A breaks to the PROM (default). 1490a1ad4f15SStephen Kitt= ==================================== 1491a1ad4f15SStephen Kitt 1492a1ad4f15SStephen KittStop-A is always enabled on a panic, so that the user can return to 1493a1ad4f15SStephen Kittthe boot PROM. 1494a1ad4f15SStephen Kitt 1495a3cb66a5SStephen Kitt 1496a3cb66a5SStephen Kittsysrq 1497a3cb66a5SStephen Kitt===== 1498a3cb66a5SStephen Kitt 14992793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/sysrq.rst. 150057043247SMauro Carvalho Chehab 150157043247SMauro Carvalho Chehab 150257043247SMauro Carvalho Chehabtainted 150357043247SMauro Carvalho Chehab======= 150457043247SMauro Carvalho Chehab 150557043247SMauro Carvalho ChehabNon-zero if the kernel has been tainted. Numeric values, which can be 150657043247SMauro Carvalho ChehabORed together. The letters are seen in "Tainted" line of Oops reports. 150757043247SMauro Carvalho Chehab 150857043247SMauro Carvalho Chehab====== ===== ============================================================== 150957043247SMauro Carvalho Chehab 1 `(P)` proprietary module was loaded 151057043247SMauro Carvalho Chehab 2 `(F)` module was force loaded 1511547f574fSMathieu Chouquet-Stringer 4 `(S)` kernel running on an out of specification system 151257043247SMauro Carvalho Chehab 8 `(R)` module was force unloaded 151357043247SMauro Carvalho Chehab 16 `(M)` processor reported a Machine Check Exception (MCE) 151457043247SMauro Carvalho Chehab 32 `(B)` bad page referenced or some unexpected page flags 151557043247SMauro Carvalho Chehab 64 `(U)` taint requested by userspace application 151657043247SMauro Carvalho Chehab 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG 151757043247SMauro Carvalho Chehab 256 `(A)` an ACPI table was overridden by user 151857043247SMauro Carvalho Chehab 512 `(W)` kernel issued warning 151957043247SMauro Carvalho Chehab 1024 `(C)` staging driver was loaded 152057043247SMauro Carvalho Chehab 2048 `(I)` workaround for bug in platform firmware applied 152157043247SMauro Carvalho Chehab 4096 `(O)` externally-built ("out-of-tree") module was loaded 152257043247SMauro Carvalho Chehab 8192 `(E)` unsigned module was loaded 152357043247SMauro Carvalho Chehab 16384 `(L)` soft lockup occurred 152457043247SMauro Carvalho Chehab 32768 `(K)` kernel has been live patched 152557043247SMauro Carvalho Chehab 65536 `(X)` Auxiliary taint, defined and used by for distros 152657043247SMauro Carvalho Chehab131072 `(T)` The kernel was built with the struct randomization plugin 152757043247SMauro Carvalho Chehab====== ===== ============================================================== 152857043247SMauro Carvalho Chehab 15292793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/tainted-kernels.rst for more information. 153057043247SMauro Carvalho Chehab 1531db38d5c1SRafael AquiniNote: 1532db38d5c1SRafael Aquini writes to this sysctl interface will fail with ``EINVAL`` if the kernel is 1533db38d5c1SRafael Aquini booted with the command line option ``panic_on_taint=<bitmask>,nousertaint`` 1534db38d5c1SRafael Aquini and any of the ORed together values being written to ``tainted`` match with 1535db38d5c1SRafael Aquini the bitmask declared on panic_on_taint. 15362793e19dSMauro Carvalho Chehab See Documentation/admin-guide/kernel-parameters.rst for more details on 15372793e19dSMauro Carvalho Chehab that particular kernel command line option and its optional 15382793e19dSMauro Carvalho Chehab ``nousertaint`` switch. 153957043247SMauro Carvalho Chehab 1540a3cb66a5SStephen Kittthreads-max 1541a3cb66a5SStephen Kitt=========== 154257043247SMauro Carvalho Chehab 154357043247SMauro Carvalho ChehabThis value controls the maximum number of threads that can be created 1544a3cb66a5SStephen Kittusing ``fork()``. 154557043247SMauro Carvalho Chehab 154657043247SMauro Carvalho ChehabDuring initialization the kernel sets this value such that even if the 154757043247SMauro Carvalho Chehabmaximum number of threads is created, the thread structures occupy only 154857043247SMauro Carvalho Chehaba part (1/8th) of the available RAM pages. 154957043247SMauro Carvalho Chehab 1550a3cb66a5SStephen KittThe minimum value that can be written to ``threads-max`` is 1. 155157043247SMauro Carvalho Chehab 1552a3cb66a5SStephen KittThe maximum value that can be written to ``threads-max`` is given by the 1553a3cb66a5SStephen Kittconstant ``FUTEX_TID_MASK`` (0x3fffffff). 155457043247SMauro Carvalho Chehab 1555a3cb66a5SStephen KittIf a value outside of this range is written to ``threads-max`` an 1556a3cb66a5SStephen Kitt``EINVAL`` error occurs. 155757043247SMauro Carvalho Chehab 1558e129fdc5SPhil Auldtimer_migration 1559e129fdc5SPhil Auld=============== 1560e129fdc5SPhil Auld 1561e129fdc5SPhil AuldWhen set to a non-zero value, attempt to migrate timers away from idle cpus to 1562e129fdc5SPhil Auldallow them to remain in low power states longer. 1563e129fdc5SPhil Auld 1564e129fdc5SPhil AuldDefault is set (1). 156557043247SMauro Carvalho Chehab 156650cdae76SStephen Kitttraceoff_on_warning 156750cdae76SStephen Kitt=================== 156850cdae76SStephen Kitt 15692793e19dSMauro Carvalho ChehabWhen set, disables tracing (see Documentation/trace/ftrace.rst) when a 157050cdae76SStephen Kitt``WARN()`` is hit. 157150cdae76SStephen Kitt 157250cdae76SStephen Kitt 157350cdae76SStephen Kitttracepoint_printk 157450cdae76SStephen Kitt================= 157550cdae76SStephen Kitt 157650cdae76SStephen KittWhen tracepoints are sent to printk() (enabled by the ``tp_printk`` 157750cdae76SStephen Kittboot parameter), this entry provides runtime control:: 157850cdae76SStephen Kitt 157950cdae76SStephen Kitt echo 0 > /proc/sys/kernel/tracepoint_printk 158050cdae76SStephen Kitt 158150cdae76SStephen Kittwill stop tracepoints from being sent to printk(), and:: 158250cdae76SStephen Kitt 158350cdae76SStephen Kitt echo 1 > /proc/sys/kernel/tracepoint_printk 158450cdae76SStephen Kitt 158550cdae76SStephen Kittwill send them to printk() again. 158650cdae76SStephen Kitt 158750cdae76SStephen KittThis only works if the kernel was booted with ``tp_printk`` enabled. 158850cdae76SStephen Kitt 15892793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/kernel-parameters.rst and 15902793e19dSMauro Carvalho ChehabDocumentation/trace/boottime-trace.rst. 159150cdae76SStephen Kitt 159250cdae76SStephen Kitt 1593997c798eSStephen Kittunaligned-trap 1594997c798eSStephen Kitt============== 1595997c798eSStephen Kitt 1596997c798eSStephen KittOn architectures where unaligned accesses cause traps, and where this 1597997c798eSStephen Kittfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently, 159861a6fcccSHuacai Chen``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps 159961a6fcccSHuacai Chenare caught and emulated (instead of failing). 1600997c798eSStephen Kitt 1601997c798eSStephen Kitt= ======================================================== 1602997c798eSStephen Kitt0 Do not emulate unaligned accesses. 1603997c798eSStephen Kitt1 Emulate unaligned accesses. This is the default setting. 1604997c798eSStephen Kitt= ======================================================== 1605997c798eSStephen Kitt 1606997c798eSStephen KittSee also `ignore-unaligned-usertrap`_. 1607997c798eSStephen Kitt 1608997c798eSStephen Kitt 1609a3cb66a5SStephen Kittunknown_nmi_panic 1610a3cb66a5SStephen Kitt================= 161157043247SMauro Carvalho Chehab 161257043247SMauro Carvalho ChehabThe value in this file affects behavior of handling NMI. When the 161357043247SMauro Carvalho Chehabvalue is non-zero, unknown NMI is trapped and then panic occurs. At 161457043247SMauro Carvalho Chehabthat time, kernel debugging information is displayed on console. 161557043247SMauro Carvalho Chehab 161657043247SMauro Carvalho ChehabNMI switch that most IA32 servers have fires unknown NMI up, for 161757043247SMauro Carvalho Chehabexample. If a system hangs up, try pressing the NMI switch. 161857043247SMauro Carvalho Chehab 161957043247SMauro Carvalho Chehab 16205d8e5aeeSStephen Kittunprivileged_bpf_disabled 16215d8e5aeeSStephen Kitt========================= 16225d8e5aeeSStephen Kitt 16235d8e5aeeSStephen KittWriting 1 to this entry will disable unprivileged calls to ``bpf()``; 162408389d88SDaniel Borkmannonce disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF`` 162508389d88SDaniel Borkmannwill return ``-EPERM``. Once set to 1, this can't be cleared from the 162608389d88SDaniel Borkmannrunning kernel anymore. 16275d8e5aeeSStephen Kitt 162808389d88SDaniel BorkmannWriting 2 to this entry will also disable unprivileged calls to ``bpf()``, 162908389d88SDaniel Borkmannhowever, an admin can still change this setting later on, if needed, by 163008389d88SDaniel Borkmannwriting 0 or 1 to this entry. 16315d8e5aeeSStephen Kitt 163208389d88SDaniel BorkmannIf ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this 163308389d88SDaniel Borkmannentry will default to 2 instead of 0. 163408389d88SDaniel Borkmann 163508389d88SDaniel Borkmann= ============================================================= 163608389d88SDaniel Borkmann0 Unprivileged calls to ``bpf()`` are enabled 163708389d88SDaniel Borkmann1 Unprivileged calls to ``bpf()`` are disabled without recovery 163808389d88SDaniel Borkmann2 Unprivileged calls to ``bpf()`` are disabled 163908389d88SDaniel Borkmann= ============================================================= 16405d8e5aeeSStephen Kitt 16419fc9e278SKees Cook 16429fc9e278SKees Cookwarn_limit 16439fc9e278SKees Cook========== 16449fc9e278SKees Cook 16459fc9e278SKees CookNumber of kernel warnings after which the kernel should panic when 16469fc9e278SKees Cook``panic_on_warn`` is not set. Setting this to 0 disables checking 16479fc9e278SKees Cookthe warning count. Setting this to 1 has the same effect as setting 16489fc9e278SKees Cook``panic_on_warn=1``. The default value is 0. 16499fc9e278SKees Cook 16509fc9e278SKees Cook 1651a3cb66a5SStephen Kittwatchdog 1652a3cb66a5SStephen Kitt======== 165357043247SMauro Carvalho Chehab 165457043247SMauro Carvalho ChehabThis parameter can be used to disable or enable the soft lockup detector 1655a3cb66a5SStephen Kitt*and* the NMI watchdog (i.e. the hard lockup detector) at the same time. 165657043247SMauro Carvalho Chehab 1657a3cb66a5SStephen Kitt= ============================== 1658a3cb66a5SStephen Kitt0 Disable both lockup detectors. 1659a3cb66a5SStephen Kitt1 Enable both lockup detectors. 1660a3cb66a5SStephen Kitt= ============================== 166157043247SMauro Carvalho Chehab 166257043247SMauro Carvalho ChehabThe soft lockup detector and the NMI watchdog can also be disabled or 1663a3cb66a5SStephen Kittenabled individually, using the ``soft_watchdog`` and ``nmi_watchdog`` 1664a3cb66a5SStephen Kittparameters. 1665a3cb66a5SStephen KittIf the ``watchdog`` parameter is read, for example by executing:: 166657043247SMauro Carvalho Chehab 166757043247SMauro Carvalho Chehab cat /proc/sys/kernel/watchdog 166857043247SMauro Carvalho Chehab 1669a3cb66a5SStephen Kittthe output of this command (0 or 1) shows the logical OR of 1670a3cb66a5SStephen Kitt``soft_watchdog`` and ``nmi_watchdog``. 167157043247SMauro Carvalho Chehab 167257043247SMauro Carvalho Chehab 1673a3cb66a5SStephen Kittwatchdog_cpumask 1674a3cb66a5SStephen Kitt================ 167557043247SMauro Carvalho Chehab 167657043247SMauro Carvalho ChehabThis value can be used to control on which cpus the watchdog may run. 1677a3cb66a5SStephen KittThe default cpumask is all possible cores, but if ``NO_HZ_FULL`` is 167857043247SMauro Carvalho Chehabenabled in the kernel config, and cores are specified with the 1679a3cb66a5SStephen Kitt``nohz_full=`` boot argument, those cores are excluded by default. 168057043247SMauro Carvalho ChehabOffline cores can be included in this mask, and if the core is later 168157043247SMauro Carvalho Chehabbrought online, the watchdog will be started based on the mask value. 168257043247SMauro Carvalho Chehab 1683a3cb66a5SStephen KittTypically this value would only be touched in the ``nohz_full`` case 168457043247SMauro Carvalho Chehabto re-enable cores that by default were not running the watchdog, 168557043247SMauro Carvalho Chehabif a kernel lockup was suspected on those cores. 168657043247SMauro Carvalho Chehab 168757043247SMauro Carvalho ChehabThe argument value is the standard cpulist format for cpumasks, 168857043247SMauro Carvalho Chehabso for example to enable the watchdog on cores 0, 2, 3, and 4 you 168957043247SMauro Carvalho Chehabmight say:: 169057043247SMauro Carvalho Chehab 169157043247SMauro Carvalho Chehab echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask 169257043247SMauro Carvalho Chehab 169357043247SMauro Carvalho Chehab 1694a3cb66a5SStephen Kittwatchdog_thresh 1695a3cb66a5SStephen Kitt=============== 169657043247SMauro Carvalho Chehab 169757043247SMauro Carvalho ChehabThis value can be used to control the frequency of hrtimer and NMI 169857043247SMauro Carvalho Chehabevents and the soft and hard lockup thresholds. The default threshold 169957043247SMauro Carvalho Chehabis 10 seconds. 170057043247SMauro Carvalho Chehab 1701a3cb66a5SStephen KittThe softlockup threshold is (``2 * watchdog_thresh``). Setting this 170257043247SMauro Carvalho Chehabtunable to zero will disable lockup detection altogether. 1703