157043247SMauro Carvalho Chehab===================================
257043247SMauro Carvalho ChehabDocumentation for /proc/sys/kernel/
357043247SMauro Carvalho Chehab===================================
457043247SMauro Carvalho Chehab
5021622dfSStephen Kitt.. See scripts/check-sysctl-docs to keep this up to date
6021622dfSStephen Kitt
7021622dfSStephen Kitt
857043247SMauro Carvalho ChehabCopyright (c) 1998, 1999,  Rik van Riel <[email protected]>
957043247SMauro Carvalho Chehab
1057043247SMauro Carvalho ChehabCopyright (c) 2009,        Shen Feng<[email protected]>
1157043247SMauro Carvalho Chehab
122793e19dSMauro Carvalho ChehabFor general info and legal blurb, please look in
132793e19dSMauro Carvalho ChehabDocumentation/admin-guide/sysctl/index.rst.
1457043247SMauro Carvalho Chehab
1557043247SMauro Carvalho Chehab------------------------------------------------------------------------------
1657043247SMauro Carvalho Chehab
1757043247SMauro Carvalho ChehabThis file contains documentation for the sysctl files in
18d151a23dSStephen Kitt``/proc/sys/kernel/``.
1957043247SMauro Carvalho Chehab
2057043247SMauro Carvalho ChehabThe files in this directory can be used to tune and monitor
2157043247SMauro Carvalho Chehabmiscellaneous and general things in the operation of the Linux
22a3cb66a5SStephen Kittkernel. Since some of the files *can* be used to screw up your
2357043247SMauro Carvalho Chehabsystem, it is advisable to read both documentation and source
2457043247SMauro Carvalho Chehabbefore actually making adjustments.
2557043247SMauro Carvalho Chehab
2657043247SMauro Carvalho ChehabCurrently, these files might (depending on your configuration)
27a3cb66a5SStephen Kittshow up in ``/proc/sys/kernel``:
2857043247SMauro Carvalho Chehab
29a3cb66a5SStephen Kitt.. contents:: :local:
3057043247SMauro Carvalho Chehab
3157043247SMauro Carvalho Chehab
32a3cb66a5SStephen Kittacct
33a3cb66a5SStephen Kitt====
34a3cb66a5SStephen Kitt
35a3cb66a5SStephen Kitt::
3657043247SMauro Carvalho Chehab
3757043247SMauro Carvalho Chehab    highwater lowwater frequency
3857043247SMauro Carvalho Chehab
3957043247SMauro Carvalho ChehabIf BSD-style process accounting is enabled these values control
4057043247SMauro Carvalho Chehabits behaviour. If free space on filesystem where the log lives
4130fb8761SStephen Kittgoes below ``lowwater``\ % accounting suspends. If free space gets
4230fb8761SStephen Kittabove ``highwater``\ % accounting resumes. ``frequency`` determines
4357043247SMauro Carvalho Chehabhow often do we check the amount of free space (value is in
4457043247SMauro Carvalho Chehabseconds). Default:
45a3cb66a5SStephen Kitt
46a3cb66a5SStephen Kitt::
47a3cb66a5SStephen Kitt
4857043247SMauro Carvalho Chehab    4 2 30
49a3cb66a5SStephen Kitt
50a3cb66a5SStephen KittThat is, suspend accounting if free space drops below 2%; resume it
51a3cb66a5SStephen Kittif it increases to at least 4%; consider information about amount of
52a3cb66a5SStephen Kittfree space valid for 30 seconds.
5357043247SMauro Carvalho Chehab
5457043247SMauro Carvalho Chehab
55a3cb66a5SStephen Kittacpi_video_flags
56a3cb66a5SStephen Kitt================
5757043247SMauro Carvalho Chehab
582793e19dSMauro Carvalho ChehabSee Documentation/power/video.rst. This allows the video resume mode to be set,
592bd49cb5SStephen Kittin a similar fashion to the ``acpi_sleep`` kernel parameter, by
602bd49cb5SStephen Kittcombining the following values:
612bd49cb5SStephen Kitt
622bd49cb5SStephen Kitt= =======
632bd49cb5SStephen Kitt1 s3_bios
642bd49cb5SStephen Kitt2 s3_mode
652bd49cb5SStephen Kitt4 s3_beep
662bd49cb5SStephen Kitt= =======
6757043247SMauro Carvalho Chehab
68bfca3dd3SPetr Vorelarch
69bfca3dd3SPetr Vorel====
70bfca3dd3SPetr Vorel
71bfca3dd3SPetr VorelThe machine hardware name, the same output as ``uname -m``
72bfca3dd3SPetr Vorel(e.g. ``x86_64`` or ``aarch64``).
7357043247SMauro Carvalho Chehab
74a3cb66a5SStephen Kittauto_msgmni
75a3cb66a5SStephen Kitt===========
7657043247SMauro Carvalho Chehab
7757043247SMauro Carvalho ChehabThis variable has no effect and may be removed in future kernel
7857043247SMauro Carvalho Chehabreleases. Reading it always returns 0.
79a3cb66a5SStephen KittUp to Linux 3.17, it enabled/disabled automatic recomputing of
80a3cb66a5SStephen Kitt`msgmni`_
81a3cb66a5SStephen Kittupon memory add/remove or upon IPC namespace creation/removal.
8257043247SMauro Carvalho ChehabEchoing "1" into this file enabled msgmni automatic recomputing.
83a3cb66a5SStephen KittEchoing "0" turned it off. The default value was 1.
8457043247SMauro Carvalho Chehab
8557043247SMauro Carvalho Chehab
86a3cb66a5SStephen Kittbootloader_type (x86 only)
87a3cb66a5SStephen Kitt==========================
8857043247SMauro Carvalho Chehab
8957043247SMauro Carvalho ChehabThis gives the bootloader type number as indicated by the bootloader,
9057043247SMauro Carvalho Chehabshifted left by 4, and OR'd with the low four bits of the bootloader
9157043247SMauro Carvalho Chehabversion.  The reason for this encoding is that this used to match the
92a3cb66a5SStephen Kitt``type_of_loader`` field in the kernel header; the encoding is kept for
9357043247SMauro Carvalho Chehabbackwards compatibility.  That is, if the full bootloader type number
9457043247SMauro Carvalho Chehabis 0x15 and the full version number is 0x234, this file will contain
9557043247SMauro Carvalho Chehabthe value 340 = 0x154.
9657043247SMauro Carvalho Chehab
97a3cb66a5SStephen KittSee the ``type_of_loader`` and ``ext_loader_type`` fields in
98ff61f079SJonathan CorbetDocumentation/arch/x86/boot.rst for additional information.
9957043247SMauro Carvalho Chehab
10057043247SMauro Carvalho Chehab
101a3cb66a5SStephen Kittbootloader_version (x86 only)
102a3cb66a5SStephen Kitt=============================
10357043247SMauro Carvalho Chehab
10457043247SMauro Carvalho ChehabThe complete bootloader version number.  In the example above, this
10557043247SMauro Carvalho Chehabfile will contain the value 564 = 0x234.
10657043247SMauro Carvalho Chehab
107a3cb66a5SStephen KittSee the ``type_of_loader`` and ``ext_loader_ver`` fields in
108ff61f079SJonathan CorbetDocumentation/arch/x86/boot.rst for additional information.
10957043247SMauro Carvalho Chehab
11057043247SMauro Carvalho Chehab
1115d8e5aeeSStephen Kittbpf_stats_enabled
1125d8e5aeeSStephen Kitt=================
1135d8e5aeeSStephen Kitt
1145d8e5aeeSStephen KittControls whether the kernel should collect statistics on BPF programs
1155d8e5aeeSStephen Kitt(total time spent running, number of times run...). Enabling
1165d8e5aeeSStephen Kittstatistics causes a slight reduction in performance on each program
1175d8e5aeeSStephen Kittrun. The statistics can be seen using ``bpftool``.
1185d8e5aeeSStephen Kitt
1195d8e5aeeSStephen Kitt= ===================================
1205d8e5aeeSStephen Kitt0 Don't collect statistics (default).
1215d8e5aeeSStephen Kitt1 Collect statistics.
1225d8e5aeeSStephen Kitt= ===================================
1235d8e5aeeSStephen Kitt
1245d8e5aeeSStephen Kitt
1256bc47621SStephen Kittcad_pid
1266bc47621SStephen Kitt=======
1276bc47621SStephen Kitt
1286bc47621SStephen KittThis is the pid which will be signalled on reboot (notably, by
1296bc47621SStephen KittCtrl-Alt-Delete). Writing a value to this file which doesn't
1306bc47621SStephen Kittcorrespond to a running process will result in ``-ESRCH``.
1316bc47621SStephen Kitt
1326bc47621SStephen KittSee also `ctrl-alt-del`_.
1336bc47621SStephen Kitt
1346bc47621SStephen Kitt
135a3cb66a5SStephen Kittcap_last_cap
136a3cb66a5SStephen Kitt============
13757043247SMauro Carvalho Chehab
13857043247SMauro Carvalho ChehabHighest valid capability of the running kernel.  Exports
139a3cb66a5SStephen Kitt``CAP_LAST_CAP`` from the kernel.
14057043247SMauro Carvalho Chehab
14157043247SMauro Carvalho Chehab
142aadc0cd5SStephen Kitt.. _core_pattern:
143aadc0cd5SStephen Kitt
144a3cb66a5SStephen Kittcore_pattern
145a3cb66a5SStephen Kitt============
14657043247SMauro Carvalho Chehab
147a3cb66a5SStephen Kitt``core_pattern`` is used to specify a core dumpfile pattern name.
14857043247SMauro Carvalho Chehab
14957043247SMauro Carvalho Chehab* max length 127 characters; default value is "core"
150a3cb66a5SStephen Kitt* ``core_pattern`` is used as a pattern template for the output
151a3cb66a5SStephen Kitt  filename; certain string patterns (beginning with '%') are
152a3cb66a5SStephen Kitt  substituted with their actual values.
153a3cb66a5SStephen Kitt* backward compatibility with ``core_uses_pid``:
15457043247SMauro Carvalho Chehab
155a3cb66a5SStephen Kitt	If ``core_pattern`` does not include "%p" (default does not)
156a3cb66a5SStephen Kitt	and ``core_uses_pid`` is set, then .PID will be appended to
15757043247SMauro Carvalho Chehab	the filename.
15857043247SMauro Carvalho Chehab
159a3cb66a5SStephen Kitt* corename format specifiers
16057043247SMauro Carvalho Chehab
161a3cb66a5SStephen Kitt	========	==========================================
16257043247SMauro Carvalho Chehab	%<NUL>		'%' is dropped
16357043247SMauro Carvalho Chehab	%%		output one '%'
16457043247SMauro Carvalho Chehab	%p		pid
16557043247SMauro Carvalho Chehab	%P		global pid (init PID namespace)
16657043247SMauro Carvalho Chehab	%i		tid
16757043247SMauro Carvalho Chehab	%I		global tid (init PID namespace)
16857043247SMauro Carvalho Chehab	%u		uid (in initial user namespace)
16957043247SMauro Carvalho Chehab	%g		gid (in initial user namespace)
170a3cb66a5SStephen Kitt	%d		dump mode, matches ``PR_SET_DUMPABLE`` and
171a3cb66a5SStephen Kitt			``/proc/sys/fs/suid_dumpable``
17257043247SMauro Carvalho Chehab	%s		signal number
17357043247SMauro Carvalho Chehab	%t		UNIX time of dump
17457043247SMauro Carvalho Chehab	%h		hostname
175f38c85f1SLepton Wu	%e		executable filename (may be shortened, could be changed by prctl etc)
176f38c85f1SLepton Wu	%f      	executable filename
17757043247SMauro Carvalho Chehab	%E		executable path
178895f2c20S[email protected]	%c		maximum size of core file by resource limit RLIMIT_CORE
1798603b6f5SOleksandr Natalenko	%C		CPU the task ran on
18057043247SMauro Carvalho Chehab	%<OTHER>	both are dropped
181a3cb66a5SStephen Kitt	========	==========================================
18257043247SMauro Carvalho Chehab
18357043247SMauro Carvalho Chehab* If the first character of the pattern is a '|', the kernel will treat
18457043247SMauro Carvalho Chehab  the rest of the pattern as a command to run.  The core dump will be
18557043247SMauro Carvalho Chehab  written to the standard input of that program instead of to a file.
18657043247SMauro Carvalho Chehab
18757043247SMauro Carvalho Chehab
188a3cb66a5SStephen Kittcore_pipe_limit
189a3cb66a5SStephen Kitt===============
19057043247SMauro Carvalho Chehab
191a3cb66a5SStephen KittThis sysctl is only applicable when `core_pattern`_ is configured to
192a3cb66a5SStephen Kittpipe core files to a user space helper (when the first character of
193a3cb66a5SStephen Kitt``core_pattern`` is a '|', see above).
194a3cb66a5SStephen KittWhen collecting cores via a pipe to an application, it is occasionally
195a3cb66a5SStephen Kittuseful for the collecting application to gather data about the
196a3cb66a5SStephen Kittcrashing process from its ``/proc/pid`` directory.
197a3cb66a5SStephen KittIn order to do this safely, the kernel must wait for the collecting
198a3cb66a5SStephen Kittprocess to exit, so as not to remove the crashing processes proc files
199a3cb66a5SStephen Kittprematurely.
200a3cb66a5SStephen KittThis in turn creates the possibility that a misbehaving userspace
201a3cb66a5SStephen Kittcollecting process can block the reaping of a crashed process simply
202a3cb66a5SStephen Kittby never exiting.
203a3cb66a5SStephen KittThis sysctl defends against that.
204a3cb66a5SStephen KittIt defines how many concurrent crashing processes may be piped to user
205a3cb66a5SStephen Kittspace applications in parallel.
206a3cb66a5SStephen KittIf this value is exceeded, then those crashing processes above that
207a3cb66a5SStephen Kittvalue are noted via the kernel log and their cores are skipped.
208a3cb66a5SStephen Kitt0 is a special value, indicating that unlimited processes may be
209a3cb66a5SStephen Kittcaptured in parallel, but that no waiting will take place (i.e. the
210a3cb66a5SStephen Kittcollecting process is not guaranteed access to ``/proc/<crashing
211a3cb66a5SStephen Kittpid>/``).
212a3cb66a5SStephen KittThis value defaults to 0.
21357043247SMauro Carvalho Chehab
21457043247SMauro Carvalho Chehab
215*39ec9eaaSKees Cookcore_sort_vma
216*39ec9eaaSKees Cook=============
217*39ec9eaaSKees Cook
218*39ec9eaaSKees CookThe default coredump writes VMAs in address order. By setting
219*39ec9eaaSKees Cook``core_sort_vma`` to 1, VMAs will be written from smallest size
220*39ec9eaaSKees Cookto largest size. This is known to break at least elfutils, but
221*39ec9eaaSKees Cookcan be handy when dealing with very large (and truncated)
222*39ec9eaaSKees Cookcoredumps where the more useful debugging details are included
223*39ec9eaaSKees Cookin the smaller VMAs.
224*39ec9eaaSKees Cook
225*39ec9eaaSKees Cook
226a3cb66a5SStephen Kittcore_uses_pid
227a3cb66a5SStephen Kitt=============
22857043247SMauro Carvalho Chehab
22957043247SMauro Carvalho ChehabThe default coredump filename is "core".  By setting
230a3cb66a5SStephen Kitt``core_uses_pid`` to 1, the coredump filename becomes core.PID.
231a3cb66a5SStephen KittIf `core_pattern`_ does not include "%p" (default does not)
232a3cb66a5SStephen Kittand ``core_uses_pid`` is set, then .PID will be appended to
23357043247SMauro Carvalho Chehabthe filename.
23457043247SMauro Carvalho Chehab
23557043247SMauro Carvalho Chehab
236a3cb66a5SStephen Kittctrl-alt-del
237a3cb66a5SStephen Kitt============
23857043247SMauro Carvalho Chehab
23957043247SMauro Carvalho ChehabWhen the value in this file is 0, ctrl-alt-del is trapped and
240a3cb66a5SStephen Kittsent to the ``init(1)`` program to handle a graceful restart.
24157043247SMauro Carvalho ChehabWhen, however, the value is > 0, Linux's reaction to a Vulcan
24257043247SMauro Carvalho ChehabNerve Pinch (tm) will be an immediate reboot, without even
24357043247SMauro Carvalho Chehabsyncing its dirty buffers.
24457043247SMauro Carvalho Chehab
24557043247SMauro Carvalho ChehabNote:
24657043247SMauro Carvalho Chehab  when a program (like dosemu) has the keyboard in 'raw'
24757043247SMauro Carvalho Chehab  mode, the ctrl-alt-del is intercepted by the program before it
24857043247SMauro Carvalho Chehab  ever reaches the kernel tty layer, and it's up to the program
24957043247SMauro Carvalho Chehab  to decide what to do with it.
25057043247SMauro Carvalho Chehab
25157043247SMauro Carvalho Chehab
252a3cb66a5SStephen Kittdmesg_restrict
253a3cb66a5SStephen Kitt==============
25457043247SMauro Carvalho Chehab
25557043247SMauro Carvalho ChehabThis toggle indicates whether unprivileged users are prevented
256a3cb66a5SStephen Kittfrom using ``dmesg(8)`` to view messages from the kernel's log
257a3cb66a5SStephen Kittbuffer.
258a3cb66a5SStephen KittWhen ``dmesg_restrict`` is set to 0 there are no restrictions.
259ee74db08SRandy DunlapWhen ``dmesg_restrict`` is set to 1, users must have
260a3cb66a5SStephen Kitt``CAP_SYSLOG`` to use ``dmesg(8)``.
26157043247SMauro Carvalho Chehab
262a3cb66a5SStephen KittThe kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
263a3cb66a5SStephen Kittdefault value of ``dmesg_restrict``.
26457043247SMauro Carvalho Chehab
26557043247SMauro Carvalho Chehab
266a3cb66a5SStephen Kittdomainname & hostname
267a3cb66a5SStephen Kitt=====================
26857043247SMauro Carvalho Chehab
26957043247SMauro Carvalho ChehabThese files can be used to set the NIS/YP domainname and the
27057043247SMauro Carvalho Chehabhostname of your box in exactly the same way as the commands
27157043247SMauro Carvalho Chehabdomainname and hostname, i.e.::
27257043247SMauro Carvalho Chehab
27357043247SMauro Carvalho Chehab	# echo "darkstar" > /proc/sys/kernel/hostname
27457043247SMauro Carvalho Chehab	# echo "mydomain" > /proc/sys/kernel/domainname
27557043247SMauro Carvalho Chehab
27657043247SMauro Carvalho Chehabhas the same effect as::
27757043247SMauro Carvalho Chehab
27857043247SMauro Carvalho Chehab	# hostname "darkstar"
27957043247SMauro Carvalho Chehab	# domainname "mydomain"
28057043247SMauro Carvalho Chehab
28157043247SMauro Carvalho ChehabNote, however, that the classic darkstar.frop.org has the
28257043247SMauro Carvalho Chehabhostname "darkstar" and DNS (Internet Domain Name Server)
28357043247SMauro Carvalho Chehabdomainname "frop.org", not to be confused with the NIS (Network
28457043247SMauro Carvalho ChehabInformation Service) or YP (Yellow Pages) domainname. These two
28557043247SMauro Carvalho Chehabdomain names are in general different. For a detailed discussion
286a3cb66a5SStephen Kittsee the ``hostname(1)`` man page.
28757043247SMauro Carvalho Chehab
28857043247SMauro Carvalho Chehab
289d75829c1SStephen Kittfirmware_config
290d75829c1SStephen Kitt===============
291d75829c1SStephen Kitt
2922793e19dSMauro Carvalho ChehabSee Documentation/driver-api/firmware/fallback-mechanisms.rst.
293d75829c1SStephen Kitt
294d75829c1SStephen KittThe entries in this directory allow the firmware loader helper
295d75829c1SStephen Kittfallback to be controlled:
296d75829c1SStephen Kitt
297d75829c1SStephen Kitt* ``force_sysfs_fallback``, when set to 1, forces the use of the
298d75829c1SStephen Kitt  fallback;
299d75829c1SStephen Kitt* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
300d75829c1SStephen Kitt
301d75829c1SStephen Kitt
30250cdae76SStephen Kittftrace_dump_on_oops
30350cdae76SStephen Kitt===================
30450cdae76SStephen Kitt
30550cdae76SStephen KittDetermines whether ``ftrace_dump()`` should be called on an oops (or
30650cdae76SStephen Kittkernel panic). This will output the contents of the ftrace buffers to
30750cdae76SStephen Kittthe console.  This is very useful for capturing traces that lead to
30850cdae76SStephen Kittcrashes and outputting them to a serial console.
30950cdae76SStephen Kitt
31019f0423fSHuang Yiwei======================= ===========================================
31150cdae76SStephen Kitt0                       Disabled (default).
31250cdae76SStephen Kitt1                       Dump buffers of all CPUs.
31319f0423fSHuang Yiwei2(orig_cpu)             Dump the buffer of the CPU that triggered the
31419f0423fSHuang Yiwei                        oops.
31519f0423fSHuang Yiwei<instance>              Dump the specific instance buffer on all CPUs.
31619f0423fSHuang Yiwei<instance>=2(orig_cpu)  Dump the specific instance buffer on the CPU
31719f0423fSHuang Yiwei                        that triggered the oops.
31819f0423fSHuang Yiwei======================= ===========================================
31950cdae76SStephen Kitt
32019f0423fSHuang YiweiMultiple instance dump is also supported, and instances are separated
32119f0423fSHuang Yiweiby commas. If global buffer also needs to be dumped, please specify
32219f0423fSHuang Yiweithe dump mode (1/2/orig_cpu) first for global buffer.
32319f0423fSHuang Yiwei
32419f0423fSHuang YiweiSo for example to dump "foo" and "bar" instance buffer on all CPUs,
32519f0423fSHuang Yiweiuser can::
32619f0423fSHuang Yiwei
32719f0423fSHuang Yiwei  echo "foo,bar" > /proc/sys/kernel/ftrace_dump_on_oops
32819f0423fSHuang Yiwei
32919f0423fSHuang YiweiTo dump global buffer and "foo" instance buffer on all
33019f0423fSHuang YiweiCPUs along with the "bar" instance buffer on CPU that triggered the
33119f0423fSHuang Yiweioops, user can::
33219f0423fSHuang Yiwei
33319f0423fSHuang Yiwei  echo "1,foo,bar=2" > /proc/sys/kernel/ftrace_dump_on_oops
33450cdae76SStephen Kitt
33550cdae76SStephen Kittftrace_enabled, stack_tracer_enabled
33650cdae76SStephen Kitt====================================
33750cdae76SStephen Kitt
3382793e19dSMauro Carvalho ChehabSee Documentation/trace/ftrace.rst.
33950cdae76SStephen Kitt
34050cdae76SStephen Kitt
341a3cb66a5SStephen Kitthardlockup_all_cpu_backtrace
342a3cb66a5SStephen Kitt============================
34357043247SMauro Carvalho Chehab
34457043247SMauro Carvalho ChehabThis value controls the hard lockup detector behavior when a hard
34557043247SMauro Carvalho Chehablockup condition is detected as to whether or not to gather further
34657043247SMauro Carvalho Chehabdebug information. If enabled, arch-specific all-CPU stack dumping
34757043247SMauro Carvalho Chehabwill be initiated.
34857043247SMauro Carvalho Chehab
349a3cb66a5SStephen Kitt= ============================================
350a3cb66a5SStephen Kitt0 Do nothing. This is the default behavior.
351a3cb66a5SStephen Kitt1 On detection capture more debug information.
352a3cb66a5SStephen Kitt= ============================================
35357043247SMauro Carvalho Chehab
35457043247SMauro Carvalho Chehab
355a3cb66a5SStephen Kitthardlockup_panic
356a3cb66a5SStephen Kitt================
35757043247SMauro Carvalho Chehab
35857043247SMauro Carvalho ChehabThis parameter can be used to control whether the kernel panics
35957043247SMauro Carvalho Chehabwhen a hard lockup is detected.
36057043247SMauro Carvalho Chehab
361a3cb66a5SStephen Kitt= ===========================
362a3cb66a5SStephen Kitt0 Don't panic on hard lockup.
363a3cb66a5SStephen Kitt1 Panic on hard lockup.
364a3cb66a5SStephen Kitt= ===========================
36557043247SMauro Carvalho Chehab
3662793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/lockup-watchdogs.rst for more information.
367a3cb66a5SStephen KittThis can also be set using the nmi_watchdog kernel parameter.
36857043247SMauro Carvalho Chehab
36957043247SMauro Carvalho Chehab
370a3cb66a5SStephen Kitthotplug
371a3cb66a5SStephen Kitt=======
37257043247SMauro Carvalho Chehab
37357043247SMauro Carvalho ChehabPath for the hotplug policy agent.
3741e886090SRasmus VillemoesDefault value is ``CONFIG_UEVENT_HELPER_PATH``, which in turn defaults
3751e886090SRasmus Villemoesto the empty string.
3761e886090SRasmus Villemoes
3771e886090SRasmus VillemoesThis file only exists when ``CONFIG_UEVENT_HELPER`` is enabled. Most
3781e886090SRasmus Villemoesmodern systems rely exclusively on the netlink-based uevent source and
3791e886090SRasmus Villemoesdon't need this.
38057043247SMauro Carvalho Chehab
38157043247SMauro Carvalho Chehab
382e996919bSRandy Dunlaphung_task_all_cpu_backtrace
383e996919bSRandy Dunlap===========================
3840ec9dc9bSGuilherme G. Piccoli
3850ec9dc9bSGuilherme G. PiccoliIf this option is set, the kernel will send an NMI to all CPUs to dump
3860ec9dc9bSGuilherme G. Piccolitheir backtraces when a hung task is detected. This file shows up if
3870ec9dc9bSGuilherme G. PiccoliCONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
3880ec9dc9bSGuilherme G. Piccoli
3890ec9dc9bSGuilherme G. Piccoli0: Won't show all CPUs backtraces when a hung task is detected.
3900ec9dc9bSGuilherme G. PiccoliThis is the default behavior.
3910ec9dc9bSGuilherme G. Piccoli
3920ec9dc9bSGuilherme G. Piccoli1: Will non-maskably interrupt all CPUs and dump their backtraces when
3930ec9dc9bSGuilherme G. Piccolia hung task is detected.
3940ec9dc9bSGuilherme G. Piccoli
3950ec9dc9bSGuilherme G. Piccoli
396a3cb66a5SStephen Kitthung_task_panic
397a3cb66a5SStephen Kitt===============
39857043247SMauro Carvalho Chehab
39957043247SMauro Carvalho ChehabControls the kernel's behavior when a hung task is detected.
400a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
40157043247SMauro Carvalho Chehab
402a3cb66a5SStephen Kitt= =================================================
403a3cb66a5SStephen Kitt0 Continue operation. This is the default behavior.
404a3cb66a5SStephen Kitt1 Panic immediately.
405a3cb66a5SStephen Kitt= =================================================
40657043247SMauro Carvalho Chehab
40757043247SMauro Carvalho Chehab
408a3cb66a5SStephen Kitthung_task_check_count
409a3cb66a5SStephen Kitt=====================
41057043247SMauro Carvalho Chehab
41157043247SMauro Carvalho ChehabThe upper bound on the number of tasks that are checked.
412a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
41357043247SMauro Carvalho Chehab
41457043247SMauro Carvalho Chehab
41562bf7065SLance Yanghung_task_detect_count
41662bf7065SLance Yang======================
41762bf7065SLance Yang
41862bf7065SLance YangIndicates the total number of tasks that have been detected as hung since
41962bf7065SLance Yangthe system boot.
42062bf7065SLance Yang
42162bf7065SLance YangThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
42262bf7065SLance Yang
42362bf7065SLance Yang
424a3cb66a5SStephen Kitthung_task_timeout_secs
425a3cb66a5SStephen Kitt======================
42657043247SMauro Carvalho Chehab
42757043247SMauro Carvalho ChehabWhen a task in D state did not get scheduled
42857043247SMauro Carvalho Chehabfor more than this value report a warning.
429a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
43057043247SMauro Carvalho Chehab
431a3cb66a5SStephen Kitt0 means infinite timeout, no checking is done.
43257043247SMauro Carvalho Chehab
433a3cb66a5SStephen KittPossible values to set are in range {0:``LONG_MAX``/``HZ``}.
43457043247SMauro Carvalho Chehab
43557043247SMauro Carvalho Chehab
436a3cb66a5SStephen Kitthung_task_check_interval_secs
437a3cb66a5SStephen Kitt=============================
43857043247SMauro Carvalho Chehab
43957043247SMauro Carvalho ChehabHung task check interval. If hung task checking is enabled
440a3cb66a5SStephen Kitt(see `hung_task_timeout_secs`_), the check is done every
441a3cb66a5SStephen Kitt``hung_task_check_interval_secs`` seconds.
442a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
44357043247SMauro Carvalho Chehab
444a3cb66a5SStephen Kitt0 (default) means use ``hung_task_timeout_secs`` as checking
445a3cb66a5SStephen Kittinterval.
446a3cb66a5SStephen Kitt
447a3cb66a5SStephen KittPossible values to set are in range {0:``LONG_MAX``/``HZ``}.
44857043247SMauro Carvalho Chehab
44957043247SMauro Carvalho Chehab
450a3cb66a5SStephen Kitthung_task_warnings
451a3cb66a5SStephen Kitt==================
45257043247SMauro Carvalho Chehab
45357043247SMauro Carvalho ChehabThe maximum number of warnings to report. During a check interval
45457043247SMauro Carvalho Chehabif a hung task is detected, this value is decreased by 1.
45557043247SMauro Carvalho ChehabWhen this value reaches 0, no more warnings will be reported.
456a3cb66a5SStephen KittThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
45757043247SMauro Carvalho Chehab
45857043247SMauro Carvalho Chehab-1: report an infinite number of warnings.
45957043247SMauro Carvalho Chehab
46057043247SMauro Carvalho Chehab
461a3cb66a5SStephen Kitthyperv_record_panic_msg
462a3cb66a5SStephen Kitt=======================
46357043247SMauro Carvalho Chehab
46457043247SMauro Carvalho ChehabControls whether the panic kmsg data should be reported to Hyper-V.
46557043247SMauro Carvalho Chehab
466a3cb66a5SStephen Kitt= =========================================================
467a3cb66a5SStephen Kitt0 Do not report panic kmsg data.
468a3cb66a5SStephen Kitt1 Report the panic kmsg data. This is the default behavior.
469a3cb66a5SStephen Kitt= =========================================================
47057043247SMauro Carvalho Chehab
47157043247SMauro Carvalho Chehab
472997c798eSStephen Kittignore-unaligned-usertrap
473997c798eSStephen Kitt=========================
474997c798eSStephen Kitt
475997c798eSStephen KittOn architectures where unaligned accesses cause traps, and where this
476997c798eSStephen Kittfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
477cbade823SHelge Dellercurrently, ``arc``, ``parisc`` and ``loongarch``), controls whether all
47861a6fcccSHuacai Chenunaligned traps are logged.
479997c798eSStephen Kitt
480997c798eSStephen Kitt= =============================================================
481997c798eSStephen Kitt0 Log all unaligned accesses.
482997c798eSStephen Kitt1 Only warn the first time a process traps. This is the default
483997c798eSStephen Kitt  setting.
484997c798eSStephen Kitt= =============================================================
485997c798eSStephen Kitt
48694483490SArd BiesheuvelSee also `unaligned-trap`_.
487997c798eSStephen Kitt
48876d3ccecSMatteo Rizzoio_uring_disabled
48976d3ccecSMatteo Rizzo=================
49076d3ccecSMatteo Rizzo
49176d3ccecSMatteo RizzoPrevents all processes from creating new io_uring instances. Enabling this
49276d3ccecSMatteo Rizzoshrinks the kernel's attack surface.
49376d3ccecSMatteo Rizzo
49476d3ccecSMatteo Rizzo= ======================================================================
49576d3ccecSMatteo Rizzo0 All processes can create io_uring instances as normal. This is the
49676d3ccecSMatteo Rizzo  default setting.
49776d3ccecSMatteo Rizzo1 io_uring creation is disabled (io_uring_setup() will fail with
49876d3ccecSMatteo Rizzo  -EPERM) for unprivileged processes not in the io_uring_group group.
49976d3ccecSMatteo Rizzo  Existing io_uring instances can still be used.  See the
50076d3ccecSMatteo Rizzo  documentation for io_uring_group for more information.
50176d3ccecSMatteo Rizzo2 io_uring creation is disabled for all processes. io_uring_setup()
50276d3ccecSMatteo Rizzo  always fails with -EPERM. Existing io_uring instances can still be
50376d3ccecSMatteo Rizzo  used.
50476d3ccecSMatteo Rizzo= ======================================================================
50576d3ccecSMatteo Rizzo
50676d3ccecSMatteo Rizzo
50776d3ccecSMatteo Rizzoio_uring_group
50876d3ccecSMatteo Rizzo==============
50976d3ccecSMatteo Rizzo
51076d3ccecSMatteo RizzoWhen io_uring_disabled is set to 1, a process must either be
51176d3ccecSMatteo Rizzoprivileged (CAP_SYS_ADMIN) or be in the io_uring_group group in order
51276d3ccecSMatteo Rizzoto create an io_uring instance.  If io_uring_group is set to -1 (the
51376d3ccecSMatteo Rizzodefault), only processes with the CAP_SYS_ADMIN capability may create
51476d3ccecSMatteo Rizzoio_uring instances.
51576d3ccecSMatteo Rizzo
51676d3ccecSMatteo Rizzo
517a3cb66a5SStephen Kittkexec_load_disabled
518a3cb66a5SStephen Kitt===================
51957043247SMauro Carvalho Chehab
52006dcb013SRicardo RibaldaA toggle indicating if the syscalls ``kexec_load`` and
52106dcb013SRicardo Ribalda``kexec_file_load`` have been disabled.
52206dcb013SRicardo RibaldaThis value defaults to 0 (false: ``kexec_*load`` enabled), but can be
52306dcb013SRicardo Ribaldaset to 1 (true: ``kexec_*load`` disabled).
524a3cb66a5SStephen KittOnce true, kexec can no longer be used, and the toggle cannot be set
525a3cb66a5SStephen Kittback to false.
526a3cb66a5SStephen KittThis allows a kexec image to be loaded before disabling the syscall,
527a3cb66a5SStephen Kittallowing a system to set up (and later use) an image without it being
528a3cb66a5SStephen Kittaltered.
529a3cb66a5SStephen KittGenerally used together with the `modules_disabled`_ sysctl.
53057043247SMauro Carvalho Chehab
531a42aaad2SRicardo Ribaldakexec_load_limit_panic
532a42aaad2SRicardo Ribalda======================
533a42aaad2SRicardo Ribalda
534a42aaad2SRicardo RibaldaThis parameter specifies a limit to the number of times the syscalls
535a42aaad2SRicardo Ribalda``kexec_load`` and ``kexec_file_load`` can be called with a crash
536a42aaad2SRicardo Ribaldaimage. It can only be set with a more restrictive value than the
537a42aaad2SRicardo Ribaldacurrent one.
538a42aaad2SRicardo Ribalda
539a42aaad2SRicardo Ribalda== ======================================================
540a42aaad2SRicardo Ribalda-1 Unlimited calls to kexec. This is the default setting.
541a42aaad2SRicardo RibaldaN  Number of calls left.
542a42aaad2SRicardo Ribalda== ======================================================
543a42aaad2SRicardo Ribalda
544a42aaad2SRicardo Ribaldakexec_load_limit_reboot
545a42aaad2SRicardo Ribalda=======================
546a42aaad2SRicardo Ribalda
547a42aaad2SRicardo RibaldaSimilar functionality as ``kexec_load_limit_panic``, but for a normal
548a42aaad2SRicardo Ribaldaimage.
54957043247SMauro Carvalho Chehab
550a3cb66a5SStephen Kittkptr_restrict
551a3cb66a5SStephen Kitt=============
55257043247SMauro Carvalho Chehab
55357043247SMauro Carvalho ChehabThis toggle indicates whether restrictions are placed on
554a3cb66a5SStephen Kittexposing kernel addresses via ``/proc`` and other interfaces.
55557043247SMauro Carvalho Chehab
556a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 0 (the default) the address is hashed
557a3cb66a5SStephen Kittbefore printing.
558a3cb66a5SStephen Kitt(This is the equivalent to %p.)
55957043247SMauro Carvalho Chehab
560a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 1, kernel pointers printed using the
561a3cb66a5SStephen Kitt%pK format specifier will be replaced with 0s unless the user has
562a3cb66a5SStephen Kitt``CAP_SYSLOG`` and effective user and group ids are equal to the real
563a3cb66a5SStephen Kittids.
564a3cb66a5SStephen KittThis is because %pK checks are done at read() time rather than open()
565a3cb66a5SStephen Kitttime, so if permissions are elevated between the open() and the read()
566a3cb66a5SStephen Kitt(e.g via a setuid binary) then %pK will not leak kernel pointers to
567a3cb66a5SStephen Kittunprivileged users.
568a3cb66a5SStephen KittNote, this is a temporary solution only.
569a3cb66a5SStephen KittThe correct long-term solution is to do the permission checks at
570a3cb66a5SStephen Kittopen() time.
571a3cb66a5SStephen KittConsider removing world read permissions from files that use %pK, and
572a3cb66a5SStephen Kittusing `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)``
573a3cb66a5SStephen Kittif leaking kernel pointer values to unprivileged users is a concern.
57457043247SMauro Carvalho Chehab
575a3cb66a5SStephen KittWhen ``kptr_restrict`` is set to 2, kernel pointers printed using
576a3cb66a5SStephen Kitt%pK will be replaced with 0s regardless of privileges.
57757043247SMauro Carvalho Chehab
57857043247SMauro Carvalho Chehab
579a3cb66a5SStephen Kittmodprobe
580a3cb66a5SStephen Kitt========
581a3cb66a5SStephen Kitt
58252338dfbSEric BiggersThe full path to the usermode helper for autoloading kernel modules,
583f4d3f25aSRasmus Villemoesby default ``CONFIG_MODPROBE_PATH``, which in turn defaults to
584f4d3f25aSRasmus Villemoes"/sbin/modprobe".  This binary is executed when the kernel requests a
585f4d3f25aSRasmus Villemoesmodule.  For example, if userspace passes an unknown filesystem type
586f4d3f25aSRasmus Villemoesto mount(), then the kernel will automatically request the
587f4d3f25aSRasmus Villemoescorresponding filesystem module by executing this usermode helper.
58852338dfbSEric BiggersThis usermode helper should insert the needed module into the kernel.
58952338dfbSEric Biggers
59052338dfbSEric BiggersThis sysctl only affects module autoloading.  It has no effect on the
59152338dfbSEric Biggersability to explicitly insert modules.
59252338dfbSEric Biggers
59352338dfbSEric BiggersThis sysctl can be used to debug module loading requests::
5940317c537SStephen Kitt
5950317c537SStephen Kitt    echo '#! /bin/sh' > /tmp/modprobe
5960317c537SStephen Kitt    echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
5970317c537SStephen Kitt    echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
5980317c537SStephen Kitt    chmod a+x /tmp/modprobe
5990317c537SStephen Kitt    echo /tmp/modprobe > /proc/sys/kernel/modprobe
6000317c537SStephen Kitt
60152338dfbSEric BiggersAlternatively, if this sysctl is set to the empty string, then module
60252338dfbSEric Biggersautoloading is completely disabled.  The kernel will not try to
60352338dfbSEric Biggersexecute a usermode helper at all, nor will it call the
60452338dfbSEric Biggerskernel_module_request LSM hook.
605a3cb66a5SStephen Kitt
60652338dfbSEric BiggersIf CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration,
60752338dfbSEric Biggersthen the configured static usermode helper overrides this sysctl,
60852338dfbSEric Biggersexcept that the empty string is still accepted to completely disable
60952338dfbSEric Biggersmodule autoloading as described above.
610a3cb66a5SStephen Kitt
611a3cb66a5SStephen Kittmodules_disabled
612a3cb66a5SStephen Kitt================
61357043247SMauro Carvalho Chehab
61457043247SMauro Carvalho ChehabA toggle value indicating if modules are allowed to be loaded
61557043247SMauro Carvalho Chehabin an otherwise modular kernel.  This toggle defaults to off
61657043247SMauro Carvalho Chehab(0), but can be set true (1).  Once true, modules can be
61757043247SMauro Carvalho Chehabneither loaded nor unloaded, and the toggle cannot be set back
618a3cb66a5SStephen Kittto false.  Generally used with the `kexec_load_disabled`_ toggle.
61957043247SMauro Carvalho Chehab
62057043247SMauro Carvalho Chehab
621a3cb66a5SStephen Kitt.. _msgmni:
622a3cb66a5SStephen Kitt
623a3cb66a5SStephen Kittmsgmax, msgmnb, and msgmni
624a3cb66a5SStephen Kitt==========================
625a3cb66a5SStephen Kitt
626fa5b5264SStephen Kitt``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by
627fa5b5264SStephen Kittdefault (``MSGMAX``).
628fa5b5264SStephen Kitt
629fa5b5264SStephen Kitt``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by
630fa5b5264SStephen Kittdefault (``MSGMNB``).
631fa5b5264SStephen Kitt
632fa5b5264SStephen Kitt``msgmni`` is the maximum number of IPC queues. 32000 by default
633fa5b5264SStephen Kitt(``MSGMNI``).
634fa5b5264SStephen Kitt
6359220066eSAlexey GladkovAll of these parameters are set per ipc namespace. The maximum number of bytes
6369220066eSAlexey Gladkovin POSIX message queues is limited by ``RLIMIT_MSGQUEUE``. This limit is
6379220066eSAlexey Gladkovrespected hierarchically in the each user namespace.
638a3cb66a5SStephen Kitt
639a3cb66a5SStephen Kittmsg_next_id, sem_next_id, and shm_next_id (System V IPC)
640a3cb66a5SStephen Kitt========================================================
64157043247SMauro Carvalho Chehab
64257043247SMauro Carvalho ChehabThese three toggles allows to specify desired id for next allocated IPC
64357043247SMauro Carvalho Chehabobject: message, semaphore or shared memory respectively.
64457043247SMauro Carvalho Chehab
64557043247SMauro Carvalho ChehabBy default they are equal to -1, which means generic allocation logic.
646a3cb66a5SStephen KittPossible values to set are in range {0:``INT_MAX``}.
64757043247SMauro Carvalho Chehab
64857043247SMauro Carvalho ChehabNotes:
64957043247SMauro Carvalho Chehab  1) kernel doesn't guarantee, that new object will have desired id. So,
65057043247SMauro Carvalho Chehab     it's up to userspace, how to handle an object with "wrong" id.
65157043247SMauro Carvalho Chehab  2) Toggle with non-default value will be set back to -1 by kernel after
65257043247SMauro Carvalho Chehab     successful IPC object allocation. If an IPC object allocation syscall
65357043247SMauro Carvalho Chehab     fails, it is undefined if the value remains unmodified or is reset to -1.
65457043247SMauro Carvalho Chehab
65517444d9bSStephen Kitt
65617444d9bSStephen Kittngroups_max
65717444d9bSStephen Kitt===========
65817444d9bSStephen Kitt
65917444d9bSStephen KittMaximum number of supplementary groups, _i.e._ the maximum size which
66017444d9bSStephen Kitt``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
66117444d9bSStephen Kitt
66217444d9bSStephen Kitt
66317444d9bSStephen Kitt
664a3cb66a5SStephen Kittnmi_watchdog
665a3cb66a5SStephen Kitt============
66657043247SMauro Carvalho Chehab
66757043247SMauro Carvalho ChehabThis parameter can be used to control the NMI watchdog
66857043247SMauro Carvalho Chehab(i.e. the hard lockup detector) on x86 systems.
66957043247SMauro Carvalho Chehab
670a3cb66a5SStephen Kitt= =================================
671a3cb66a5SStephen Kitt0 Disable the hard lockup detector.
672a3cb66a5SStephen Kitt1 Enable the hard lockup detector.
673a3cb66a5SStephen Kitt= =================================
67457043247SMauro Carvalho Chehab
67557043247SMauro Carvalho ChehabThe hard lockup detector monitors each CPU for its ability to respond to
67657043247SMauro Carvalho Chehabtimer interrupts. The mechanism utilizes CPU performance counter registers
67757043247SMauro Carvalho Chehabthat are programmed to generate Non-Maskable Interrupts (NMIs) periodically
67857043247SMauro Carvalho Chehabwhile a CPU is busy. Hence, the alternative name 'NMI watchdog'.
67957043247SMauro Carvalho Chehab
68057043247SMauro Carvalho ChehabThe NMI watchdog is disabled by default if the kernel is running as a guest
68157043247SMauro Carvalho Chehabin a KVM virtual machine. This default can be overridden by adding::
68257043247SMauro Carvalho Chehab
68357043247SMauro Carvalho Chehab   nmi_watchdog=1
68457043247SMauro Carvalho Chehab
6852793e19dSMauro Carvalho Chehabto the guest kernel command line (see
6862793e19dSMauro Carvalho ChehabDocumentation/admin-guide/kernel-parameters.rst).
68757043247SMauro Carvalho Chehab
68857043247SMauro Carvalho Chehab
689118b1366SLaurent Dufournmi_wd_lpm_factor (PPC only)
690118b1366SLaurent Dufour============================
691118b1366SLaurent Dufour
692118b1366SLaurent DufourFactor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is
693118b1366SLaurent Dufourset to 1). This factor represents the percentage added to
694118b1366SLaurent Dufour``watchdog_thresh`` when calculating the NMI watchdog timeout during an
695118b1366SLaurent DufourLPM. The soft lockup timeout is not impacted.
696118b1366SLaurent Dufour
697118b1366SLaurent DufourA value of 0 means no change. The default value is 200 meaning the NMI
698118b1366SLaurent Dufourwatchdog is set to 30s (based on ``watchdog_thresh`` equal to 10).
699118b1366SLaurent Dufour
700118b1366SLaurent Dufour
701a3cb66a5SStephen Kittnuma_balancing
702a3cb66a5SStephen Kitt==============
70357043247SMauro Carvalho Chehab
704c574bbe9SHuang YingEnables/disables and configures automatic page fault based NUMA memory
705c574bbe9SHuang Yingbalancing.  Memory is moved automatically to nodes that access it often.
706c574bbe9SHuang YingThe value to set can be the result of ORing the following:
70757043247SMauro Carvalho Chehab
708c574bbe9SHuang Ying= =================================
709c574bbe9SHuang Ying0 NUMA_BALANCING_DISABLED
710c574bbe9SHuang Ying1 NUMA_BALANCING_NORMAL
711c574bbe9SHuang Ying2 NUMA_BALANCING_MEMORY_TIERING
712c574bbe9SHuang Ying= =================================
713c574bbe9SHuang Ying
714c574bbe9SHuang YingOr NUMA_BALANCING_NORMAL to optimize page placement among different
715c574bbe9SHuang YingNUMA nodes to reduce remote accessing.  On NUMA machines, there is a
716c574bbe9SHuang Yingperformance penalty if remote memory is accessed by a CPU. When this
717c574bbe9SHuang Yingfeature is enabled the kernel samples what task thread is accessing
718c574bbe9SHuang Yingmemory by periodically unmapping pages and later trapping a page
719c574bbe9SHuang Yingfault. At the time of the page fault, it is determined if the data
720c574bbe9SHuang Yingbeing accessed should be migrated to a local memory node.
72157043247SMauro Carvalho Chehab
72257043247SMauro Carvalho ChehabThe unmapping of pages and trapping faults incur additional overhead that
72357043247SMauro Carvalho Chehabideally is offset by improved memory locality but there is no universal
72457043247SMauro Carvalho Chehabguarantee. If the target workload is already bound to NUMA nodes then this
7253624ba7bSHuang Yingfeature should be disabled.
72657043247SMauro Carvalho Chehab
727c574bbe9SHuang YingOr NUMA_BALANCING_MEMORY_TIERING to optimize page placement among
728c574bbe9SHuang Yingdifferent types of memory (represented as different NUMA nodes) to
729c574bbe9SHuang Yingplace the hot pages in the fast memory.  This is implemented based on
730c574bbe9SHuang Yingunmapping and page fault too.
73157043247SMauro Carvalho Chehab
732c6833e10SHuang Yingnuma_balancing_promote_rate_limit_MBps
733c6833e10SHuang Ying======================================
734c6833e10SHuang Ying
735c6833e10SHuang YingToo high promotion/demotion throughput between different memory types
736c6833e10SHuang Yingmay hurt application latency.  This can be used to rate limit the
737c6833e10SHuang Yingpromotion throughput.  The per-node max promotion throughput in MB/s
738c6833e10SHuang Yingwill be limited to be no more than the set value.
739c6833e10SHuang Ying
740c6833e10SHuang YingA rule of thumb is to set this to less than 1/10 of the PMEM node
741c6833e10SHuang Yingwrite bandwidth.
742c6833e10SHuang Ying
743e996919bSRandy Dunlapoops_all_cpu_backtrace
744e996919bSRandy Dunlap======================
74560c958d8SGuilherme G. Piccoli
74660c958d8SGuilherme G. PiccoliIf this option is set, the kernel will send an NMI to all CPUs to dump
74760c958d8SGuilherme G. Piccolitheir backtraces when an oops event occurs. It should be used as a last
74860c958d8SGuilherme G. Piccoliresort in case a panic cannot be triggered (to protect VMs running, for
74960c958d8SGuilherme G. Piccoliexample) or kdump can't be collected. This file shows up if CONFIG_SMP
75060c958d8SGuilherme G. Piccoliis enabled.
75160c958d8SGuilherme G. Piccoli
75260c958d8SGuilherme G. Piccoli0: Won't show all CPUs backtraces when an oops is detected.
75360c958d8SGuilherme G. PiccoliThis is the default behavior.
75460c958d8SGuilherme G. Piccoli
75560c958d8SGuilherme G. Piccoli1: Will non-maskably interrupt all CPUs and dump their backtraces when
75660c958d8SGuilherme G. Piccolian oops event is detected.
75760c958d8SGuilherme G. Piccoli
75860c958d8SGuilherme G. Piccoli
759d4ccd54dSJann Hornoops_limit
760d4ccd54dSJann Horn==========
761d4ccd54dSJann Horn
762d4ccd54dSJann HornNumber of kernel oopses after which the kernel should panic when
763de92f657SKees Cook``panic_on_oops`` is not set. Setting this to 0 disables checking
764de92f657SKees Cookthe count. Setting this to  1 has the same effect as setting
765de92f657SKees Cook``panic_on_oops=1``. The default value is 10000.
766d4ccd54dSJann Horn
767d4ccd54dSJann Horn
768a3cb66a5SStephen Kittosrelease, ostype & version
769a3cb66a5SStephen Kitt===========================
77057043247SMauro Carvalho Chehab
77157043247SMauro Carvalho Chehab::
77257043247SMauro Carvalho Chehab
77357043247SMauro Carvalho Chehab  # cat osrelease
77457043247SMauro Carvalho Chehab  2.1.88
77557043247SMauro Carvalho Chehab  # cat ostype
77657043247SMauro Carvalho Chehab  Linux
77757043247SMauro Carvalho Chehab  # cat version
77857043247SMauro Carvalho Chehab  #5 Wed Feb 25 21:49:24 MET 1998
77957043247SMauro Carvalho Chehab
780a3cb66a5SStephen KittThe files ``osrelease`` and ``ostype`` should be clear enough.
781a3cb66a5SStephen Kitt``version``
78257043247SMauro Carvalho Chehabneeds a little more clarification however. The '#5' means that
78357043247SMauro Carvalho Chehabthis is the fifth kernel built from this source base and the
78457043247SMauro Carvalho Chehabdate behind it indicates the time the kernel was built.
78557043247SMauro Carvalho ChehabThe only way to tune these values is to rebuild the kernel :-)
78657043247SMauro Carvalho Chehab
78757043247SMauro Carvalho Chehab
788a3cb66a5SStephen Kittoverflowgid & overflowuid
789a3cb66a5SStephen Kitt=========================
79057043247SMauro Carvalho Chehab
79157043247SMauro Carvalho Chehabif your architecture did not always support 32-bit UIDs (i.e. arm,
79257043247SMauro Carvalho Chehabi386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
79357043247SMauro Carvalho Chehabapplications that use the old 16-bit UID/GID system calls, if the
79457043247SMauro Carvalho Chehabactual UID or GID would exceed 65535.
79557043247SMauro Carvalho Chehab
79657043247SMauro Carvalho ChehabThese sysctls allow you to change the value of the fixed UID and GID.
79757043247SMauro Carvalho ChehabThe default is 65534.
79857043247SMauro Carvalho Chehab
79957043247SMauro Carvalho Chehab
800a3cb66a5SStephen Kittpanic
801a3cb66a5SStephen Kitt=====
80257043247SMauro Carvalho Chehab
803404347e6SStephen KittThe value in this file determines the behaviour of the kernel on a
804404347e6SStephen Kittpanic:
805404347e6SStephen Kitt
806404347e6SStephen Kitt* if zero, the kernel will loop forever;
807404347e6SStephen Kitt* if negative, the kernel will reboot immediately;
808404347e6SStephen Kitt* if positive, the kernel will reboot after the corresponding number
809404347e6SStephen Kitt  of seconds.
810404347e6SStephen Kitt
811404347e6SStephen KittWhen you use the software watchdog, the recommended setting is 60.
81257043247SMauro Carvalho Chehab
81357043247SMauro Carvalho Chehab
814a3cb66a5SStephen Kittpanic_on_io_nmi
815a3cb66a5SStephen Kitt===============
81657043247SMauro Carvalho Chehab
81757043247SMauro Carvalho ChehabControls the kernel's behavior when a CPU receives an NMI caused by
81857043247SMauro Carvalho Chehaban IO error.
81957043247SMauro Carvalho Chehab
820a3cb66a5SStephen Kitt= ==================================================================
821a3cb66a5SStephen Kitt0 Try to continue operation (default).
822a3cb66a5SStephen Kitt1 Panic immediately. The IO error triggered an NMI. This indicates a
82357043247SMauro Carvalho Chehab  serious system condition which could result in IO data corruption.
82457043247SMauro Carvalho Chehab  Rather than continuing, panicking might be a better choice. Some
82557043247SMauro Carvalho Chehab  servers issue this sort of NMI when the dump button is pushed,
82657043247SMauro Carvalho Chehab  and you can use this option to take a crash dump.
827a3cb66a5SStephen Kitt= ==================================================================
82857043247SMauro Carvalho Chehab
82957043247SMauro Carvalho Chehab
830a3cb66a5SStephen Kittpanic_on_oops
831a3cb66a5SStephen Kitt=============
83257043247SMauro Carvalho Chehab
83357043247SMauro Carvalho ChehabControls the kernel's behaviour when an oops or BUG is encountered.
83457043247SMauro Carvalho Chehab
835a3cb66a5SStephen Kitt= ===================================================================
836a3cb66a5SStephen Kitt0 Try to continue operation.
837a3cb66a5SStephen Kitt1 Panic immediately.  If the `panic` sysctl is also non-zero then the
83857043247SMauro Carvalho Chehab  machine will be rebooted.
839a3cb66a5SStephen Kitt= ===================================================================
84057043247SMauro Carvalho Chehab
84157043247SMauro Carvalho Chehab
842a3cb66a5SStephen Kittpanic_on_stackoverflow
843a3cb66a5SStephen Kitt======================
84457043247SMauro Carvalho Chehab
84557043247SMauro Carvalho ChehabControls the kernel's behavior when detecting the overflows of
84657043247SMauro Carvalho Chehabkernel, IRQ and exception stacks except a user stack.
847a3cb66a5SStephen KittThis file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled.
84857043247SMauro Carvalho Chehab
849a3cb66a5SStephen Kitt= ==========================
850a3cb66a5SStephen Kitt0 Try to continue operation.
851a3cb66a5SStephen Kitt1 Panic immediately.
852a3cb66a5SStephen Kitt= ==========================
85357043247SMauro Carvalho Chehab
85457043247SMauro Carvalho Chehab
855a3cb66a5SStephen Kittpanic_on_unrecovered_nmi
856a3cb66a5SStephen Kitt========================
85757043247SMauro Carvalho Chehab
85857043247SMauro Carvalho ChehabThe default Linux behaviour on an NMI of either memory or unknown is
85957043247SMauro Carvalho Chehabto continue operation. For many environments such as scientific
86057043247SMauro Carvalho Chehabcomputing it is preferable that the box is taken out and the error
86157043247SMauro Carvalho Chehabdealt with than an uncorrected parity/ECC error get propagated.
86257043247SMauro Carvalho Chehab
863a3cb66a5SStephen KittA small number of systems do generate NMIs for bizarre random reasons
86457043247SMauro Carvalho Chehabsuch as power management so the default is off. That sysctl works like
86557043247SMauro Carvalho Chehabthe existing panic controls already in that directory.
86657043247SMauro Carvalho Chehab
86757043247SMauro Carvalho Chehab
868a3cb66a5SStephen Kittpanic_on_warn
869a3cb66a5SStephen Kitt=============
87057043247SMauro Carvalho Chehab
87157043247SMauro Carvalho ChehabCalls panic() in the WARN() path when set to 1.  This is useful to avoid
87257043247SMauro Carvalho Chehaba kernel rebuild when attempting to kdump at the location of a WARN().
87357043247SMauro Carvalho Chehab
874a3cb66a5SStephen Kitt= ================================================
875a3cb66a5SStephen Kitt0 Only WARN(), default behaviour.
876a3cb66a5SStephen Kitt1 Call panic() after printing out WARN() location.
877a3cb66a5SStephen Kitt= ================================================
87857043247SMauro Carvalho Chehab
87957043247SMauro Carvalho Chehab
880a3cb66a5SStephen Kittpanic_print
881a3cb66a5SStephen Kitt===========
88257043247SMauro Carvalho Chehab
88357043247SMauro Carvalho ChehabBitmask for printing system info when panic happens. User can chose
88457043247SMauro Carvalho Chehabcombination of the following bits:
88557043247SMauro Carvalho Chehab
886a3cb66a5SStephen Kitt=====  ============================================
88757043247SMauro Carvalho Chehabbit 0  print all tasks info
88857043247SMauro Carvalho Chehabbit 1  print system memory info
88957043247SMauro Carvalho Chehabbit 2  print timer info
890a3cb66a5SStephen Kittbit 3  print locks info if ``CONFIG_LOCKDEP`` is on
89157043247SMauro Carvalho Chehabbit 4  print ftrace buffer
892a1ff1de0SGuilherme G. Piccolibit 5  print all printk messages in buffer
8938d470a45SGuilherme G. Piccolibit 6  print all CPUs backtrace (if available in the arch)
8942e3fc6caSFeng Tangbit 7  print only tasks in uninterruptible (blocked) state
895a3cb66a5SStephen Kitt=====  ============================================
89657043247SMauro Carvalho Chehab
89757043247SMauro Carvalho ChehabSo for example to print tasks and memory info on panic, user can::
89857043247SMauro Carvalho Chehab
89957043247SMauro Carvalho Chehab  echo 3 > /proc/sys/kernel/panic_print
90057043247SMauro Carvalho Chehab
90157043247SMauro Carvalho Chehab
902a3cb66a5SStephen Kittpanic_on_rcu_stall
903a3cb66a5SStephen Kitt==================
90457043247SMauro Carvalho Chehab
90557043247SMauro Carvalho ChehabWhen set to 1, calls panic() after RCU stall detection messages. This
90657043247SMauro Carvalho Chehabis useful to define the root cause of RCU stalls using a vmcore.
90757043247SMauro Carvalho Chehab
908a3cb66a5SStephen Kitt= ============================================================
909a3cb66a5SStephen Kitt0 Do not panic() when RCU stall takes place, default behavior.
910a3cb66a5SStephen Kitt1 panic() after printing RCU stall messages.
911a3cb66a5SStephen Kitt= ============================================================
91257043247SMauro Carvalho Chehab
91381c65365SJoel Savitzmax_rcu_stall_to_panic
91481c65365SJoel Savitz======================
91581c65365SJoel Savitz
91681c65365SJoel SavitzWhen ``panic_on_rcu_stall`` is set to 1, this value determines the
91781c65365SJoel Savitznumber of times that RCU can stall before panic() is called.
91881c65365SJoel Savitz
91981c65365SJoel SavitzWhen ``panic_on_rcu_stall`` is set to 0, this value is has no effect.
92057043247SMauro Carvalho Chehab
921a3cb66a5SStephen Kittperf_cpu_time_max_percent
922a3cb66a5SStephen Kitt=========================
92357043247SMauro Carvalho Chehab
92457043247SMauro Carvalho ChehabHints to the kernel how much CPU time it should be allowed to
92557043247SMauro Carvalho Chehabuse to handle perf sampling events.  If the perf subsystem
92657043247SMauro Carvalho Chehabis informed that its samples are exceeding this limit, it
92757043247SMauro Carvalho Chehabwill drop its sampling frequency to attempt to reduce its CPU
92857043247SMauro Carvalho Chehabusage.
92957043247SMauro Carvalho Chehab
93057043247SMauro Carvalho ChehabSome perf sampling happens in NMIs.  If these samples
93157043247SMauro Carvalho Chehabunexpectedly take too long to execute, the NMIs can become
93257043247SMauro Carvalho Chehabstacked up next to each other so much that nothing else is
93357043247SMauro Carvalho Chehaballowed to execute.
93457043247SMauro Carvalho Chehab
935a3cb66a5SStephen Kitt===== ========================================================
936a3cb66a5SStephen Kitt0     Disable the mechanism.  Do not monitor or correct perf's
93757043247SMauro Carvalho Chehab      sampling rate no matter how CPU time it takes.
93857043247SMauro Carvalho Chehab
939a3cb66a5SStephen Kitt1-100 Attempt to throttle perf's sample rate to this
94057043247SMauro Carvalho Chehab      percentage of CPU.  Note: the kernel calculates an
94157043247SMauro Carvalho Chehab      "expected" length of each sample event.  100 here means
94257043247SMauro Carvalho Chehab      100% of that expected length.  Even if this is set to
94357043247SMauro Carvalho Chehab      100, you may still see sample throttling if this
94457043247SMauro Carvalho Chehab      length is exceeded.  Set to 0 if you truly do not care
94557043247SMauro Carvalho Chehab      how much CPU is consumed.
946a3cb66a5SStephen Kitt===== ========================================================
94757043247SMauro Carvalho Chehab
94857043247SMauro Carvalho Chehab
949a3cb66a5SStephen Kittperf_event_paranoid
950a3cb66a5SStephen Kitt===================
95157043247SMauro Carvalho Chehab
95257043247SMauro Carvalho ChehabControls use of the performance events system by unprivileged
953025b16f8SAlexey Budankovusers (without CAP_PERFMON).  The default value is 2.
954025b16f8SAlexey Budankov
955025b16f8SAlexey BudankovFor backward compatibility reasons access to system performance
956025b16f8SAlexey Budankovmonitoring and observability remains open for CAP_SYS_ADMIN
957025b16f8SAlexey Budankovprivileged processes but CAP_SYS_ADMIN usage for secure system
958025b16f8SAlexey Budankovperformance monitoring and observability operations is discouraged
959025b16f8SAlexey Budankovwith respect to CAP_PERFMON use cases.
96057043247SMauro Carvalho Chehab
96157043247SMauro Carvalho Chehab===  ==================================================================
962a3cb66a5SStephen Kitt -1  Allow use of (almost) all events by all users.
96357043247SMauro Carvalho Chehab
964a3cb66a5SStephen Kitt     Ignore mlock limit after perf_event_mlock_kb without
965a3cb66a5SStephen Kitt     ``CAP_IPC_LOCK``.
96657043247SMauro Carvalho Chehab
967a3cb66a5SStephen Kitt>=0  Disallow ftrace function tracepoint by users without
968025b16f8SAlexey Budankov     ``CAP_PERFMON``.
96957043247SMauro Carvalho Chehab
970025b16f8SAlexey Budankov     Disallow raw tracepoint access by users without ``CAP_PERFMON``.
97157043247SMauro Carvalho Chehab
972025b16f8SAlexey Budankov>=1  Disallow CPU event access by users without ``CAP_PERFMON``.
97357043247SMauro Carvalho Chehab
974025b16f8SAlexey Budankov>=2  Disallow kernel profiling by users without ``CAP_PERFMON``.
97557043247SMauro Carvalho Chehab===  ==================================================================
97657043247SMauro Carvalho Chehab
97757043247SMauro Carvalho Chehab
978a3cb66a5SStephen Kittperf_event_max_stack
979a3cb66a5SStephen Kitt====================
98057043247SMauro Carvalho Chehab
981a3cb66a5SStephen KittControls maximum number of stack frames to copy for (``attr.sample_type &
982a3cb66a5SStephen KittPERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using
983a3cb66a5SStephen Kitt'``perf record -g``' or '``perf trace --call-graph fp``'.
98457043247SMauro Carvalho Chehab
98557043247SMauro Carvalho ChehabThis can only be done when no events are in use that have callchains
986a3cb66a5SStephen Kittenabled, otherwise writing to this file will return ``-EBUSY``.
98757043247SMauro Carvalho Chehab
98857043247SMauro Carvalho ChehabThe default value is 127.
98957043247SMauro Carvalho Chehab
99057043247SMauro Carvalho Chehab
991a3cb66a5SStephen Kittperf_event_mlock_kb
992a3cb66a5SStephen Kitt===================
99357043247SMauro Carvalho Chehab
994751d5b27SAndrew KlychkovControl size of per-cpu ring buffer not counted against mlock limit.
99557043247SMauro Carvalho Chehab
99657043247SMauro Carvalho ChehabThe default value is 512 + 1 page
99757043247SMauro Carvalho Chehab
99857043247SMauro Carvalho Chehab
999a3cb66a5SStephen Kittperf_event_max_contexts_per_stack
1000a3cb66a5SStephen Kitt=================================
100157043247SMauro Carvalho Chehab
100257043247SMauro Carvalho ChehabControls maximum number of stack frame context entries for
1003a3cb66a5SStephen Kitt(``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for
1004a3cb66a5SStephen Kittinstance, when using '``perf record -g``' or '``perf trace --call-graph fp``'.
100557043247SMauro Carvalho Chehab
100657043247SMauro Carvalho ChehabThis can only be done when no events are in use that have callchains
1007a3cb66a5SStephen Kittenabled, otherwise writing to this file will return ``-EBUSY``.
100857043247SMauro Carvalho Chehab
100957043247SMauro Carvalho ChehabThe default value is 8.
101057043247SMauro Carvalho Chehab
101157043247SMauro Carvalho Chehab
101257972127SAlexandre Ghitiperf_user_access (arm64 and riscv only)
101357972127SAlexandre Ghiti=======================================
1014e2012600SRob Herring
101557972127SAlexandre GhitiControls user space access for reading perf event counters.
101657972127SAlexandre Ghiti
101757972127SAlexandre Ghitiarm64
101857972127SAlexandre Ghiti=====
1019e2012600SRob Herring
1020e2012600SRob HerringThe default value is 0 (access disabled).
1021e2012600SRob Herring
102257972127SAlexandre GhitiWhen set to 1, user space can read performance monitor counter registers
102357972127SAlexandre Ghitidirectly.
102457972127SAlexandre Ghiti
1025e4624435SJonathan CorbetSee Documentation/arch/arm64/perf.rst for more information.
1026e2012600SRob Herring
102757972127SAlexandre Ghitiriscv
102857972127SAlexandre Ghiti=====
102957972127SAlexandre Ghiti
103057972127SAlexandre GhitiWhen set to 0, user space access is disabled.
103157972127SAlexandre Ghiti
103257972127SAlexandre GhitiThe default value is 1, user space can read performance monitor counter
103357972127SAlexandre Ghitiregisters through perf, any direct access without perf intervention will trigger
103457972127SAlexandre Ghitian illegal instruction.
103557972127SAlexandre Ghiti
103657972127SAlexandre GhitiWhen set to 2, which enables legacy mode (user space has direct access to cycle
103757972127SAlexandre Ghitiand insret CSRs only). Note that this legacy value is deprecated and will be
103857972127SAlexandre Ghitiremoved once all user space applications are fixed.
103957972127SAlexandre Ghiti
104057972127SAlexandre GhitiNote that the time CSR is always directly accessible to all modes.
1041e2012600SRob Herring
1042a3cb66a5SStephen Kittpid_max
1043a3cb66a5SStephen Kitt=======
104457043247SMauro Carvalho Chehab
104557043247SMauro Carvalho ChehabPID allocation wrap value.  When the kernel's next PID value
104657043247SMauro Carvalho Chehabreaches this value, it wraps back to a minimum PID value.
1047a3cb66a5SStephen KittPIDs of value ``pid_max`` or larger are not allocated.
104857043247SMauro Carvalho Chehab
104957043247SMauro Carvalho Chehab
1050a3cb66a5SStephen Kittns_last_pid
1051a3cb66a5SStephen Kitt===========
105257043247SMauro Carvalho Chehab
105357043247SMauro Carvalho ChehabThe last pid allocated in the current (the one task using this sysctl
105457043247SMauro Carvalho Chehablives in) pid namespace. When selecting a pid for a next task on fork
105557043247SMauro Carvalho Chehabkernel tries to allocate a number starting from this one.
105657043247SMauro Carvalho Chehab
105757043247SMauro Carvalho Chehab
1058a3cb66a5SStephen Kittpowersave-nap (PPC only)
1059a3cb66a5SStephen Kitt========================
106057043247SMauro Carvalho Chehab
106157043247SMauro Carvalho ChehabIf set, Linux-PPC will use the 'nap' mode of powersaving,
106257043247SMauro Carvalho Chehabotherwise the 'doze' mode will be used.
106357043247SMauro Carvalho Chehab
1064a3cb66a5SStephen Kitt
106557043247SMauro Carvalho Chehab==============================================================
106657043247SMauro Carvalho Chehab
1067a3cb66a5SStephen Kittprintk
1068a3cb66a5SStephen Kitt======
106957043247SMauro Carvalho Chehab
1070a3cb66a5SStephen KittThe four values in printk denote: ``console_loglevel``,
1071a3cb66a5SStephen Kitt``default_message_loglevel``, ``minimum_console_loglevel`` and
1072a3cb66a5SStephen Kitt``default_console_loglevel`` respectively.
107357043247SMauro Carvalho Chehab
107457043247SMauro Carvalho ChehabThese values influence printk() behavior when printing or
1075a3cb66a5SStephen Kittlogging error messages. See '``man 2 syslog``' for more info on
107657043247SMauro Carvalho Chehabthe different loglevels.
107757043247SMauro Carvalho Chehab
1078a3cb66a5SStephen Kitt======================== =====================================
1079a3cb66a5SStephen Kittconsole_loglevel         messages with a higher priority than
108057043247SMauro Carvalho Chehab                         this will be printed to the console
1081a3cb66a5SStephen Kittdefault_message_loglevel messages without an explicit priority
108257043247SMauro Carvalho Chehab                         will be printed with this priority
1083a3cb66a5SStephen Kittminimum_console_loglevel minimum (highest) value to which
108457043247SMauro Carvalho Chehab                         console_loglevel can be set
1085a3cb66a5SStephen Kittdefault_console_loglevel default value for console_loglevel
1086a3cb66a5SStephen Kitt======================== =====================================
108757043247SMauro Carvalho Chehab
108857043247SMauro Carvalho Chehab
1089a3cb66a5SStephen Kittprintk_delay
1090a3cb66a5SStephen Kitt============
109157043247SMauro Carvalho Chehab
1092a3cb66a5SStephen KittDelay each printk message in ``printk_delay`` milliseconds
109357043247SMauro Carvalho Chehab
109457043247SMauro Carvalho ChehabValue from 0 - 10000 is allowed.
109557043247SMauro Carvalho Chehab
109657043247SMauro Carvalho Chehab
1097a3cb66a5SStephen Kittprintk_ratelimit
1098a3cb66a5SStephen Kitt================
109957043247SMauro Carvalho Chehab
1100a3cb66a5SStephen KittSome warning messages are rate limited. ``printk_ratelimit`` specifies
1101ca30ad85SOleksandr Natalenkothe minimum length of time between these messages (in seconds).
1102ca30ad85SOleksandr NatalenkoThe default value is 5 seconds.
110357043247SMauro Carvalho Chehab
110457043247SMauro Carvalho ChehabA value of 0 will disable rate limiting.
110557043247SMauro Carvalho Chehab
110657043247SMauro Carvalho Chehab
1107a3cb66a5SStephen Kittprintk_ratelimit_burst
1108a3cb66a5SStephen Kitt======================
110957043247SMauro Carvalho Chehab
1110a3cb66a5SStephen KittWhile long term we enforce one message per `printk_ratelimit`_
111157043247SMauro Carvalho Chehabseconds, we do allow a burst of messages to pass through.
1112a3cb66a5SStephen Kitt``printk_ratelimit_burst`` specifies the number of messages we can
111357043247SMauro Carvalho Chehabsend before ratelimiting kicks in.
111457043247SMauro Carvalho Chehab
1115ca30ad85SOleksandr NatalenkoThe default value is 10 messages.
1116ca30ad85SOleksandr Natalenko
111757043247SMauro Carvalho Chehab
1118a3cb66a5SStephen Kittprintk_devkmsg
1119a3cb66a5SStephen Kitt==============
112057043247SMauro Carvalho Chehab
1121a3cb66a5SStephen KittControl the logging to ``/dev/kmsg`` from userspace:
112257043247SMauro Carvalho Chehab
1123a3cb66a5SStephen Kitt========= =============================================
1124a3cb66a5SStephen Kittratelimit default, ratelimited
1125a3cb66a5SStephen Kitton        unlimited logging to /dev/kmsg from userspace
1126a3cb66a5SStephen Kittoff       logging to /dev/kmsg disabled
1127a3cb66a5SStephen Kitt========= =============================================
112857043247SMauro Carvalho Chehab
1129a3cb66a5SStephen KittThe kernel command line parameter ``printk.devkmsg=`` overrides this and is
113057043247SMauro Carvalho Chehaba one-time setting until next reboot: once set, it cannot be changed by
113157043247SMauro Carvalho Chehabthis sysctl interface anymore.
113257043247SMauro Carvalho Chehab
1133a3cb66a5SStephen Kitt==============================================================
113457043247SMauro Carvalho Chehab
1135a3cb66a5SStephen Kitt
1136a3cb66a5SStephen Kittpty
1137a3cb66a5SStephen Kitt===
1138a3cb66a5SStephen Kitt
113901478b83SMauro Carvalho ChehabSee Documentation/filesystems/devpts.rst.
1140a3cb66a5SStephen Kitt
1141a3cb66a5SStephen Kitt
11420b227076SStephen Kittrandom
11430b227076SStephen Kitt======
11440b227076SStephen Kitt
11450b227076SStephen KittThis is a directory, with the following entries:
11460b227076SStephen Kitt
11470b227076SStephen Kitt* ``boot_id``: a UUID generated the first time this is retrieved, and
11480b227076SStephen Kitt  unvarying after that;
11490b227076SStephen Kitt
1150069c4ea6SJason A. Donenfeld* ``uuid``: a UUID generated every time this is retrieved (this can
1151069c4ea6SJason A. Donenfeld  thus be used to generate UUIDs at will);
1152069c4ea6SJason A. Donenfeld
11530b227076SStephen Kitt* ``entropy_avail``: the pool's entropy count, in bits;
11540b227076SStephen Kitt
11550b227076SStephen Kitt* ``poolsize``: the entropy pool size, in bits;
11560b227076SStephen Kitt
11570b227076SStephen Kitt* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
1158489c7fc4SJason A. Donenfeld  number of seconds between urandom pool reseeding). This file is
1159489c7fc4SJason A. Donenfeld  writable for compatibility purposes, but writing to it has no effect
1160069c4ea6SJason A. Donenfeld  on any RNG behavior;
11610b227076SStephen Kitt
11620b227076SStephen Kitt* ``write_wakeup_threshold``: when the entropy count drops below this
11630b227076SStephen Kitt  (as a number of bits), processes waiting to write to ``/dev/random``
1164489c7fc4SJason A. Donenfeld  are woken up. This file is writable for compatibility purposes, but
1165489c7fc4SJason A. Donenfeld  writing to it has no effect on any RNG behavior.
11660b227076SStephen Kitt
11670b227076SStephen Kitt
1168a3cb66a5SStephen Kittrandomize_va_space
1169a3cb66a5SStephen Kitt==================
117057043247SMauro Carvalho Chehab
117157043247SMauro Carvalho ChehabThis option can be used to select the type of process address
117257043247SMauro Carvalho Chehabspace randomization that is used in the system, for architectures
117357043247SMauro Carvalho Chehabthat support this feature.
117457043247SMauro Carvalho Chehab
117557043247SMauro Carvalho Chehab==  ===========================================================================
117657043247SMauro Carvalho Chehab0   Turn the process address space randomization off.  This is the
117757043247SMauro Carvalho Chehab    default for architectures that do not support this feature anyways,
117857043247SMauro Carvalho Chehab    and kernels that are booted with the "norandmaps" parameter.
117957043247SMauro Carvalho Chehab
118057043247SMauro Carvalho Chehab1   Make the addresses of mmap base, stack and VDSO page randomized.
118157043247SMauro Carvalho Chehab    This, among other things, implies that shared libraries will be
118257043247SMauro Carvalho Chehab    loaded to random addresses.  Also for PIE-linked binaries, the
118357043247SMauro Carvalho Chehab    location of code start is randomized.  This is the default if the
1184a3cb66a5SStephen Kitt    ``CONFIG_COMPAT_BRK`` option is enabled.
118557043247SMauro Carvalho Chehab
118657043247SMauro Carvalho Chehab2   Additionally enable heap randomization.  This is the default if
1187a3cb66a5SStephen Kitt    ``CONFIG_COMPAT_BRK`` is disabled.
118857043247SMauro Carvalho Chehab
118957043247SMauro Carvalho Chehab    There are a few legacy applications out there (such as some ancient
119057043247SMauro Carvalho Chehab    versions of libc.so.5 from 1996) that assume that brk area starts
119157043247SMauro Carvalho Chehab    just after the end of the code+bss.  These applications break when
119257043247SMauro Carvalho Chehab    start of the brk area is randomized.  There are however no known
119357043247SMauro Carvalho Chehab    non-legacy applications that would be broken this way, so for most
119457043247SMauro Carvalho Chehab    systems it is safe to choose full randomization.
119557043247SMauro Carvalho Chehab
119657043247SMauro Carvalho Chehab    Systems with ancient and/or broken binaries should be configured
1197a3cb66a5SStephen Kitt    with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
119857043247SMauro Carvalho Chehab    address space randomization.
119957043247SMauro Carvalho Chehab==  ===========================================================================
120057043247SMauro Carvalho Chehab
120157043247SMauro Carvalho Chehab
1202a3cb66a5SStephen Kittreal-root-dev
1203a3cb66a5SStephen Kitt=============
1204a3cb66a5SStephen Kitt
12052793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/initrd.rst.
1206a3cb66a5SStephen Kitt
1207a3cb66a5SStephen Kitt
1208a3cb66a5SStephen Kittreboot-cmd (SPARC only)
1209a3cb66a5SStephen Kitt=======================
121057043247SMauro Carvalho Chehab
121157043247SMauro Carvalho Chehab??? This seems to be a way to give an argument to the Sparc
121257043247SMauro Carvalho ChehabROM/Flash boot loader. Maybe to tell it what to do after
121357043247SMauro Carvalho Chehabrebooting. ???
121457043247SMauro Carvalho Chehab
121557043247SMauro Carvalho Chehab
1216a3cb66a5SStephen Kittsched_energy_aware
1217a3cb66a5SStephen Kitt==================
121857043247SMauro Carvalho Chehab
121957043247SMauro Carvalho ChehabEnables/disables Energy Aware Scheduling (EAS). EAS starts
122057043247SMauro Carvalho Chehabautomatically on platforms where it can run (that is,
122157043247SMauro Carvalho Chehabplatforms with asymmetric CPU topologies and having an Energy
122257043247SMauro Carvalho ChehabModel available). If your platform happens to meet the
122357043247SMauro Carvalho Chehabrequirements for EAS but you do not want to use it, change
12248f833c82SShrikanth Hegdethis value to 0. On Non-EAS platforms, write operation fails and
12258f833c82SShrikanth Hegderead doesn't return anything.
122657043247SMauro Carvalho Chehab
1227fcb50170SMel Gormantask_delayacct
1228fcb50170SMel Gorman===============
1229fcb50170SMel Gorman
1230fcb50170SMel GormanEnables/disables task delay accounting (see
12310f60a29cSMauro Carvalho ChehabDocumentation/accounting/delay-accounting.rst. Enabling this feature incurs
1232fcb50170SMel Gormana small amount of overhead in the scheduler but is useful for debugging
1233fcb50170SMel Gormanand performance tuning. It is required by some tools such as iotop.
123457043247SMauro Carvalho Chehab
1235a3cb66a5SStephen Kittsched_schedstats
1236a3cb66a5SStephen Kitt================
123757043247SMauro Carvalho Chehab
123857043247SMauro Carvalho ChehabEnables/disables scheduler statistics. Enabling this feature
123957043247SMauro Carvalho Chehabincurs a small amount of overhead in the scheduler but is
124057043247SMauro Carvalho Chehabuseful for debugging and performance tuning.
124157043247SMauro Carvalho Chehab
1242d151a23dSStephen Kittsched_util_clamp_min
1243d151a23dSStephen Kitt====================
12441f73d1abSQais Yousef
12451f73d1abSQais YousefMax allowed *minimum* utilization.
12461f73d1abSQais Yousef
12471f73d1abSQais YousefDefault value is 1024, which is the maximum possible value.
12481f73d1abSQais Yousef
12491f73d1abSQais YousefIt means that any requested uclamp.min value cannot be greater than
12501f73d1abSQais Yousefsched_util_clamp_min, i.e., it is restricted to the range
12511f73d1abSQais Yousef[0:sched_util_clamp_min].
12521f73d1abSQais Yousef
1253d151a23dSStephen Kittsched_util_clamp_max
1254d151a23dSStephen Kitt====================
12551f73d1abSQais Yousef
12561f73d1abSQais YousefMax allowed *maximum* utilization.
12571f73d1abSQais Yousef
12581f73d1abSQais YousefDefault value is 1024, which is the maximum possible value.
12591f73d1abSQais Yousef
12601f73d1abSQais YousefIt means that any requested uclamp.max value cannot be greater than
12611f73d1abSQais Yousefsched_util_clamp_max, i.e., it is restricted to the range
12621f73d1abSQais Yousef[0:sched_util_clamp_max].
12631f73d1abSQais Yousef
1264d151a23dSStephen Kittsched_util_clamp_min_rt_default
1265d151a23dSStephen Kitt===============================
12661f73d1abSQais Yousef
12671f73d1abSQais YousefBy default Linux is tuned for performance. Which means that RT tasks always run
12681f73d1abSQais Yousefat the highest frequency and most capable (highest capacity) CPU (in
12691f73d1abSQais Yousefheterogeneous systems).
12701f73d1abSQais Yousef
12711f73d1abSQais YousefUclamp achieves this by setting the requested uclamp.min of all RT tasks to
12721f73d1abSQais Yousef1024 by default, which effectively boosts the tasks to run at the highest
12731f73d1abSQais Youseffrequency and biases them to run on the biggest CPU.
12741f73d1abSQais Yousef
12751f73d1abSQais YousefThis knob allows admins to change the default behavior when uclamp is being
12761f73d1abSQais Yousefused. In battery powered devices particularly, running at the maximum
12771f73d1abSQais Yousefcapacity and frequency will increase energy consumption and shorten the battery
12781f73d1abSQais Youseflife.
12791f73d1abSQais Yousef
12801f73d1abSQais YousefThis knob is only effective for RT tasks which the user hasn't modified their
12811f73d1abSQais Yousefrequested uclamp.min value via sched_setattr() syscall.
12821f73d1abSQais Yousef
12831f73d1abSQais YousefThis knob will not escape the range constraint imposed by sched_util_clamp_min
12841f73d1abSQais Yousefdefined above.
12851f73d1abSQais Yousef
12861f73d1abSQais YousefFor example if
12871f73d1abSQais Yousef
12881f73d1abSQais Yousef	sched_util_clamp_min_rt_default = 800
12891f73d1abSQais Yousef	sched_util_clamp_min = 600
12901f73d1abSQais Yousef
12911f73d1abSQais YousefThen the boost will be clamped to 600 because 800 is outside of the permissible
12921f73d1abSQais Yousefrange of [0:600]. This could happen for instance if a powersave mode will
12931f73d1abSQais Yousefrestrict all boosts temporarily by modifying sched_util_clamp_min. As soon as
12941f73d1abSQais Yousefthis restriction is lifted, the requested sched_util_clamp_min_rt_default
12951f73d1abSQais Yousefwill take effect.
129657043247SMauro Carvalho Chehab
1297a3cb66a5SStephen Kittseccomp
1298a3cb66a5SStephen Kitt=======
1299a3cb66a5SStephen Kitt
13002793e19dSMauro Carvalho ChehabSee Documentation/userspace-api/seccomp_filter.rst.
1301a3cb66a5SStephen Kitt
1302a3cb66a5SStephen Kitt
1303a3cb66a5SStephen Kittsg-big-buff
1304a3cb66a5SStephen Kitt===========
130557043247SMauro Carvalho Chehab
130657043247SMauro Carvalho ChehabThis file shows the size of the generic SCSI (sg) buffer.
130757043247SMauro Carvalho ChehabYou can't tune it just yet, but you could change it on
1308a3cb66a5SStephen Kittcompile time by editing ``include/scsi/sg.h`` and changing
1309a3cb66a5SStephen Kittthe value of ``SG_BIG_BUFF``.
131057043247SMauro Carvalho Chehab
131157043247SMauro Carvalho ChehabThere shouldn't be any reason to change this value. If
131257043247SMauro Carvalho Chehabyou can come up with one, you probably know what you
131357043247SMauro Carvalho Chehabare doing anyway :)
131457043247SMauro Carvalho Chehab
131557043247SMauro Carvalho Chehab
1316a3cb66a5SStephen Kittshmall
1317a3cb66a5SStephen Kitt======
131857043247SMauro Carvalho Chehab
13199220066eSAlexey GladkovThis parameter sets the total amount of shared memory pages that can be used
13209220066eSAlexey Gladkovinside ipc namespace. The shared memory pages counting occurs for each ipc
13219220066eSAlexey Gladkovnamespace separately and is not inherited. Hence, ``shmall`` should always be at
13229220066eSAlexey Gladkovleast ``ceil(shmmax/PAGE_SIZE)``.
132357043247SMauro Carvalho Chehab
1324a3cb66a5SStephen KittIf you are not sure what the default ``PAGE_SIZE`` is on your Linux
1325a3cb66a5SStephen Kittsystem, you can run the following command::
132657043247SMauro Carvalho Chehab
132757043247SMauro Carvalho Chehab	# getconf PAGE_SIZE
132857043247SMauro Carvalho Chehab
13299220066eSAlexey GladkovTo reduce or disable the ability to allocate shared memory, you must create a
13309220066eSAlexey Gladkovnew ipc namespace, set this parameter to the required value and prohibit the
13319220066eSAlexey Gladkovcreation of a new ipc namespace in the current user namespace or cgroups can
13329220066eSAlexey Gladkovbe used.
133357043247SMauro Carvalho Chehab
1334a3cb66a5SStephen Kittshmmax
1335a3cb66a5SStephen Kitt======
133657043247SMauro Carvalho Chehab
133757043247SMauro Carvalho ChehabThis value can be used to query and set the run time limit
133857043247SMauro Carvalho Chehabon the maximum shared memory segment size that can be created.
133957043247SMauro Carvalho ChehabShared memory segments up to 1Gb are now supported in the
1340a3cb66a5SStephen Kittkernel.  This value defaults to ``SHMMAX``.
134157043247SMauro Carvalho Chehab
134257043247SMauro Carvalho Chehab
1343a3cb66a5SStephen Kittshmmni
1344a3cb66a5SStephen Kitt======
1345a3cb66a5SStephen Kitt
1346fa5b5264SStephen KittThis value determines the maximum number of shared memory segments.
1347fa5b5264SStephen Kitt4096 by default (``SHMMNI``).
1348fa5b5264SStephen Kitt
1349a3cb66a5SStephen Kitt
1350a3cb66a5SStephen Kittshm_rmid_forced
1351a3cb66a5SStephen Kitt===============
135257043247SMauro Carvalho Chehab
135357043247SMauro Carvalho ChehabLinux lets you set resource limits, including how much memory one
1354a3cb66a5SStephen Kittprocess can consume, via ``setrlimit(2)``.  Unfortunately, shared memory
135557043247SMauro Carvalho Chehabsegments are allowed to exist without association with any process, and
135657043247SMauro Carvalho Chehabthus might not be counted against any resource limits.  If enabled,
135757043247SMauro Carvalho Chehabshared memory segments are automatically destroyed when their attach
135857043247SMauro Carvalho Chehabcount becomes zero after a detach or a process termination.  It will
135957043247SMauro Carvalho Chehabalso destroy segments that were created, but never attached to, on exit
1360a3cb66a5SStephen Kittfrom the process.  The only use left for ``IPC_RMID`` is to immediately
136157043247SMauro Carvalho Chehabdestroy an unattached segment.  Of course, this breaks the way things are
136257043247SMauro Carvalho Chehabdefined, so some applications might stop working.  Note that this
136357043247SMauro Carvalho Chehabfeature will do you no good unless you also configure your resource
1364a3cb66a5SStephen Kittlimits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``).  Most systems don't
136557043247SMauro Carvalho Chehabneed this.
136657043247SMauro Carvalho Chehab
136757043247SMauro Carvalho ChehabNote that if you change this from 0 to 1, already created segments
136857043247SMauro Carvalho Chehabwithout users and with a dead originative process will be destroyed.
136957043247SMauro Carvalho Chehab
137057043247SMauro Carvalho Chehab
1371a3cb66a5SStephen Kittsysctl_writes_strict
1372a3cb66a5SStephen Kitt====================
137357043247SMauro Carvalho Chehab
137457043247SMauro Carvalho ChehabControl how file position affects the behavior of updating sysctl values
1375a3cb66a5SStephen Kittvia the ``/proc/sys`` interface:
137657043247SMauro Carvalho Chehab
137757043247SMauro Carvalho Chehab  ==   ======================================================================
137857043247SMauro Carvalho Chehab  -1   Legacy per-write sysctl value handling, with no printk warnings.
137957043247SMauro Carvalho Chehab       Each write syscall must fully contain the sysctl value to be
138057043247SMauro Carvalho Chehab       written, and multiple writes on the same sysctl file descriptor
138157043247SMauro Carvalho Chehab       will rewrite the sysctl value, regardless of file position.
138257043247SMauro Carvalho Chehab   0   Same behavior as above, but warn about processes that perform writes
138357043247SMauro Carvalho Chehab       to a sysctl file descriptor when the file position is not 0.
138457043247SMauro Carvalho Chehab   1   (default) Respect file position when writing sysctl strings. Multiple
138557043247SMauro Carvalho Chehab       writes will append to the sysctl value buffer. Anything past the max
138657043247SMauro Carvalho Chehab       length of the sysctl value buffer will be ignored. Writes to numeric
138757043247SMauro Carvalho Chehab       sysctl entries must always be at file position 0 and the value must
138857043247SMauro Carvalho Chehab       be fully contained in the buffer sent in the write syscall.
138957043247SMauro Carvalho Chehab  ==   ======================================================================
139057043247SMauro Carvalho Chehab
139157043247SMauro Carvalho Chehab
1392a3cb66a5SStephen Kittsoftlockup_all_cpu_backtrace
1393a3cb66a5SStephen Kitt============================
139457043247SMauro Carvalho Chehab
139557043247SMauro Carvalho ChehabThis value controls the soft lockup detector thread's behavior
139657043247SMauro Carvalho Chehabwhen a soft lockup condition is detected as to whether or not
139757043247SMauro Carvalho Chehabto gather further debug information. If enabled, each cpu will
139857043247SMauro Carvalho Chehabbe issued an NMI and instructed to capture stack trace.
139957043247SMauro Carvalho Chehab
140057043247SMauro Carvalho ChehabThis feature is only applicable for architectures which support
140157043247SMauro Carvalho ChehabNMI.
140257043247SMauro Carvalho Chehab
1403a3cb66a5SStephen Kitt= ============================================
1404a3cb66a5SStephen Kitt0 Do nothing. This is the default behavior.
1405a3cb66a5SStephen Kitt1 On detection capture more debug information.
1406a3cb66a5SStephen Kitt= ============================================
140757043247SMauro Carvalho Chehab
140857043247SMauro Carvalho Chehab
14090a07bef6SGuilherme G. Piccolisoftlockup_panic
14100a07bef6SGuilherme G. Piccoli=================
14110a07bef6SGuilherme G. Piccoli
14120a07bef6SGuilherme G. PiccoliThis parameter can be used to control whether the kernel panics
14130a07bef6SGuilherme G. Piccoliwhen a soft lockup is detected.
14140a07bef6SGuilherme G. Piccoli
14150a07bef6SGuilherme G. Piccoli= ============================================
14160a07bef6SGuilherme G. Piccoli0 Don't panic on soft lockup.
14170a07bef6SGuilherme G. Piccoli1 Panic on soft lockup.
14180a07bef6SGuilherme G. Piccoli= ============================================
14190a07bef6SGuilherme G. Piccoli
14200a07bef6SGuilherme G. PiccoliThis can also be set using the softlockup_panic kernel parameter.
14210a07bef6SGuilherme G. Piccoli
14220a07bef6SGuilherme G. Piccoli
1423a3cb66a5SStephen Kittsoft_watchdog
1424a3cb66a5SStephen Kitt=============
142557043247SMauro Carvalho Chehab
142657043247SMauro Carvalho ChehabThis parameter can be used to control the soft lockup detector.
142757043247SMauro Carvalho Chehab
1428a3cb66a5SStephen Kitt= =================================
1429a3cb66a5SStephen Kitt0 Disable the soft lockup detector.
1430a3cb66a5SStephen Kitt1 Enable the soft lockup detector.
1431a3cb66a5SStephen Kitt= =================================
143257043247SMauro Carvalho Chehab
143357043247SMauro Carvalho ChehabThe soft lockup detector monitors CPUs for threads that are hogging the CPUs
1434256f7a67SWang Qingwithout rescheduling voluntarily, and thus prevent the 'migration/N' threads
1435256f7a67SWang Qingfrom running, causing the watchdog work fail to execute. The mechanism depends
1436256f7a67SWang Qingon the CPUs ability to respond to timer interrupts which are needed for the
1437256f7a67SWang Qingwatchdog work to be queued by the watchdog timer function, otherwise the NMI
1438256f7a67SWang Qingwatchdog — if enabled — can detect a hard lockup condition.
143957043247SMauro Carvalho Chehab
144057043247SMauro Carvalho Chehab
144172720937SGuilherme G. Piccolisplit_lock_mitigate (x86 only)
144272720937SGuilherme G. Piccoli==============================
144372720937SGuilherme G. Piccoli
144472720937SGuilherme G. PiccoliOn x86, each "split lock" imposes a system-wide performance penalty. On larger
144572720937SGuilherme G. Piccolisystems, large numbers of split locks from unprivileged users can result in
144672720937SGuilherme G. Piccolidenials of service to well-behaved and potentially more important users.
144772720937SGuilherme G. Piccoli
144872720937SGuilherme G. PiccoliThe kernel mitigates these bad users by detecting split locks and imposing
144972720937SGuilherme G. Piccolipenalties: forcing them to wait and only allowing one core to execute split
145072720937SGuilherme G. Piccolilocks at a time.
145172720937SGuilherme G. Piccoli
145272720937SGuilherme G. PiccoliThese mitigations can make those bad applications unbearably slow. Setting
145372720937SGuilherme G. Piccolisplit_lock_mitigate=0 may restore some application performance, but will also
145472720937SGuilherme G. Piccoliincrease system exposure to denial of service attacks from split lock users.
145572720937SGuilherme G. Piccoli
145672720937SGuilherme G. Piccoli= ===================================================================
145772720937SGuilherme G. Piccoli0 Disable the mitigation mode - just warns the split lock on kernel log
145872720937SGuilherme G. Piccoli  and exposes the system to denials of service from the split lockers.
145972720937SGuilherme G. Piccoli1 Enable the mitigation mode (this is the default) - penalizes the split
146072720937SGuilherme G. Piccoli  lockers with intentional performance degradation.
146172720937SGuilherme G. Piccoli= ===================================================================
146272720937SGuilherme G. Piccoli
146372720937SGuilherme G. Piccoli
1464a3cb66a5SStephen Kittstack_erasing
1465a3cb66a5SStephen Kitt=============
146657043247SMauro Carvalho Chehab
146757043247SMauro Carvalho ChehabThis parameter can be used to control kernel stack erasing at the end
1468a3cb66a5SStephen Kittof syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``.
146957043247SMauro Carvalho Chehab
147057043247SMauro Carvalho ChehabThat erasing reduces the information which kernel stack leak bugs
147157043247SMauro Carvalho Chehabcan reveal and blocks some uninitialized stack variable attacks.
147257043247SMauro Carvalho ChehabThe tradeoff is the performance impact: on a single CPU system kernel
147357043247SMauro Carvalho Chehabcompilation sees a 1% slowdown, other systems and workloads may vary.
147457043247SMauro Carvalho Chehab
1475a3cb66a5SStephen Kitt= ====================================================================
1476a3cb66a5SStephen Kitt0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
1477a3cb66a5SStephen Kitt1 Kernel stack erasing is enabled (default), it is performed before
147857043247SMauro Carvalho Chehab  returning to the userspace at the end of syscalls.
1479a3cb66a5SStephen Kitt= ====================================================================
1480a3cb66a5SStephen Kitt
1481a3cb66a5SStephen Kitt
1482a3cb66a5SStephen Kittstop-a (SPARC only)
1483a3cb66a5SStephen Kitt===================
1484a3cb66a5SStephen Kitt
1485a1ad4f15SStephen KittControls Stop-A:
1486a1ad4f15SStephen Kitt
1487a1ad4f15SStephen Kitt= ====================================
1488a1ad4f15SStephen Kitt0 Stop-A has no effect.
1489a1ad4f15SStephen Kitt1 Stop-A breaks to the PROM (default).
1490a1ad4f15SStephen Kitt= ====================================
1491a1ad4f15SStephen Kitt
1492a1ad4f15SStephen KittStop-A is always enabled on a panic, so that the user can return to
1493a1ad4f15SStephen Kittthe boot PROM.
1494a1ad4f15SStephen Kitt
1495a3cb66a5SStephen Kitt
1496a3cb66a5SStephen Kittsysrq
1497a3cb66a5SStephen Kitt=====
1498a3cb66a5SStephen Kitt
14992793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/sysrq.rst.
150057043247SMauro Carvalho Chehab
150157043247SMauro Carvalho Chehab
150257043247SMauro Carvalho Chehabtainted
150357043247SMauro Carvalho Chehab=======
150457043247SMauro Carvalho Chehab
150557043247SMauro Carvalho ChehabNon-zero if the kernel has been tainted. Numeric values, which can be
150657043247SMauro Carvalho ChehabORed together. The letters are seen in "Tainted" line of Oops reports.
150757043247SMauro Carvalho Chehab
150857043247SMauro Carvalho Chehab======  =====  ==============================================================
150957043247SMauro Carvalho Chehab     1  `(P)`  proprietary module was loaded
151057043247SMauro Carvalho Chehab     2  `(F)`  module was force loaded
1511547f574fSMathieu Chouquet-Stringer     4  `(S)`  kernel running on an out of specification system
151257043247SMauro Carvalho Chehab     8  `(R)`  module was force unloaded
151357043247SMauro Carvalho Chehab    16  `(M)`  processor reported a Machine Check Exception (MCE)
151457043247SMauro Carvalho Chehab    32  `(B)`  bad page referenced or some unexpected page flags
151557043247SMauro Carvalho Chehab    64  `(U)`  taint requested by userspace application
151657043247SMauro Carvalho Chehab   128  `(D)`  kernel died recently, i.e. there was an OOPS or BUG
151757043247SMauro Carvalho Chehab   256  `(A)`  an ACPI table was overridden by user
151857043247SMauro Carvalho Chehab   512  `(W)`  kernel issued warning
151957043247SMauro Carvalho Chehab  1024  `(C)`  staging driver was loaded
152057043247SMauro Carvalho Chehab  2048  `(I)`  workaround for bug in platform firmware applied
152157043247SMauro Carvalho Chehab  4096  `(O)`  externally-built ("out-of-tree") module was loaded
152257043247SMauro Carvalho Chehab  8192  `(E)`  unsigned module was loaded
152357043247SMauro Carvalho Chehab 16384  `(L)`  soft lockup occurred
152457043247SMauro Carvalho Chehab 32768  `(K)`  kernel has been live patched
152557043247SMauro Carvalho Chehab 65536  `(X)`  Auxiliary taint, defined and used by for distros
152657043247SMauro Carvalho Chehab131072  `(T)`  The kernel was built with the struct randomization plugin
152757043247SMauro Carvalho Chehab======  =====  ==============================================================
152857043247SMauro Carvalho Chehab
15292793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/tainted-kernels.rst for more information.
153057043247SMauro Carvalho Chehab
1531db38d5c1SRafael AquiniNote:
1532db38d5c1SRafael Aquini  writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
1533db38d5c1SRafael Aquini  booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
1534db38d5c1SRafael Aquini  and any of the ORed together values being written to ``tainted`` match with
1535db38d5c1SRafael Aquini  the bitmask declared on panic_on_taint.
15362793e19dSMauro Carvalho Chehab  See Documentation/admin-guide/kernel-parameters.rst for more details on
15372793e19dSMauro Carvalho Chehab  that particular kernel command line option and its optional
15382793e19dSMauro Carvalho Chehab  ``nousertaint`` switch.
153957043247SMauro Carvalho Chehab
1540a3cb66a5SStephen Kittthreads-max
1541a3cb66a5SStephen Kitt===========
154257043247SMauro Carvalho Chehab
154357043247SMauro Carvalho ChehabThis value controls the maximum number of threads that can be created
1544a3cb66a5SStephen Kittusing ``fork()``.
154557043247SMauro Carvalho Chehab
154657043247SMauro Carvalho ChehabDuring initialization the kernel sets this value such that even if the
154757043247SMauro Carvalho Chehabmaximum number of threads is created, the thread structures occupy only
154857043247SMauro Carvalho Chehaba part (1/8th) of the available RAM pages.
154957043247SMauro Carvalho Chehab
1550a3cb66a5SStephen KittThe minimum value that can be written to ``threads-max`` is 1.
155157043247SMauro Carvalho Chehab
1552a3cb66a5SStephen KittThe maximum value that can be written to ``threads-max`` is given by the
1553a3cb66a5SStephen Kittconstant ``FUTEX_TID_MASK`` (0x3fffffff).
155457043247SMauro Carvalho Chehab
1555a3cb66a5SStephen KittIf a value outside of this range is written to ``threads-max`` an
1556a3cb66a5SStephen Kitt``EINVAL`` error occurs.
155757043247SMauro Carvalho Chehab
1558e129fdc5SPhil Auldtimer_migration
1559e129fdc5SPhil Auld===============
1560e129fdc5SPhil Auld
1561e129fdc5SPhil AuldWhen set to a non-zero value, attempt to migrate timers away from idle cpus to
1562e129fdc5SPhil Auldallow them to remain in low power states longer.
1563e129fdc5SPhil Auld
1564e129fdc5SPhil AuldDefault is set (1).
156557043247SMauro Carvalho Chehab
156650cdae76SStephen Kitttraceoff_on_warning
156750cdae76SStephen Kitt===================
156850cdae76SStephen Kitt
15692793e19dSMauro Carvalho ChehabWhen set, disables tracing (see Documentation/trace/ftrace.rst) when a
157050cdae76SStephen Kitt``WARN()`` is hit.
157150cdae76SStephen Kitt
157250cdae76SStephen Kitt
157350cdae76SStephen Kitttracepoint_printk
157450cdae76SStephen Kitt=================
157550cdae76SStephen Kitt
157650cdae76SStephen KittWhen tracepoints are sent to printk() (enabled by the ``tp_printk``
157750cdae76SStephen Kittboot parameter), this entry provides runtime control::
157850cdae76SStephen Kitt
157950cdae76SStephen Kitt    echo 0 > /proc/sys/kernel/tracepoint_printk
158050cdae76SStephen Kitt
158150cdae76SStephen Kittwill stop tracepoints from being sent to printk(), and::
158250cdae76SStephen Kitt
158350cdae76SStephen Kitt    echo 1 > /proc/sys/kernel/tracepoint_printk
158450cdae76SStephen Kitt
158550cdae76SStephen Kittwill send them to printk() again.
158650cdae76SStephen Kitt
158750cdae76SStephen KittThis only works if the kernel was booted with ``tp_printk`` enabled.
158850cdae76SStephen Kitt
15892793e19dSMauro Carvalho ChehabSee Documentation/admin-guide/kernel-parameters.rst and
15902793e19dSMauro Carvalho ChehabDocumentation/trace/boottime-trace.rst.
159150cdae76SStephen Kitt
159250cdae76SStephen Kitt
1593997c798eSStephen Kittunaligned-trap
1594997c798eSStephen Kitt==============
1595997c798eSStephen Kitt
1596997c798eSStephen KittOn architectures where unaligned accesses cause traps, and where this
1597997c798eSStephen Kittfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
159861a6fcccSHuacai Chen``arc``, ``parisc`` and ``loongarch``), controls whether unaligned traps
159961a6fcccSHuacai Chenare caught and emulated (instead of failing).
1600997c798eSStephen Kitt
1601997c798eSStephen Kitt= ========================================================
1602997c798eSStephen Kitt0 Do not emulate unaligned accesses.
1603997c798eSStephen Kitt1 Emulate unaligned accesses. This is the default setting.
1604997c798eSStephen Kitt= ========================================================
1605997c798eSStephen Kitt
1606997c798eSStephen KittSee also `ignore-unaligned-usertrap`_.
1607997c798eSStephen Kitt
1608997c798eSStephen Kitt
1609a3cb66a5SStephen Kittunknown_nmi_panic
1610a3cb66a5SStephen Kitt=================
161157043247SMauro Carvalho Chehab
161257043247SMauro Carvalho ChehabThe value in this file affects behavior of handling NMI. When the
161357043247SMauro Carvalho Chehabvalue is non-zero, unknown NMI is trapped and then panic occurs. At
161457043247SMauro Carvalho Chehabthat time, kernel debugging information is displayed on console.
161557043247SMauro Carvalho Chehab
161657043247SMauro Carvalho ChehabNMI switch that most IA32 servers have fires unknown NMI up, for
161757043247SMauro Carvalho Chehabexample.  If a system hangs up, try pressing the NMI switch.
161857043247SMauro Carvalho Chehab
161957043247SMauro Carvalho Chehab
16205d8e5aeeSStephen Kittunprivileged_bpf_disabled
16215d8e5aeeSStephen Kitt=========================
16225d8e5aeeSStephen Kitt
16235d8e5aeeSStephen KittWriting 1 to this entry will disable unprivileged calls to ``bpf()``;
162408389d88SDaniel Borkmannonce disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF``
162508389d88SDaniel Borkmannwill return ``-EPERM``. Once set to 1, this can't be cleared from the
162608389d88SDaniel Borkmannrunning kernel anymore.
16275d8e5aeeSStephen Kitt
162808389d88SDaniel BorkmannWriting 2 to this entry will also disable unprivileged calls to ``bpf()``,
162908389d88SDaniel Borkmannhowever, an admin can still change this setting later on, if needed, by
163008389d88SDaniel Borkmannwriting 0 or 1 to this entry.
16315d8e5aeeSStephen Kitt
163208389d88SDaniel BorkmannIf ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this
163308389d88SDaniel Borkmannentry will default to 2 instead of 0.
163408389d88SDaniel Borkmann
163508389d88SDaniel Borkmann= =============================================================
163608389d88SDaniel Borkmann0 Unprivileged calls to ``bpf()`` are enabled
163708389d88SDaniel Borkmann1 Unprivileged calls to ``bpf()`` are disabled without recovery
163808389d88SDaniel Borkmann2 Unprivileged calls to ``bpf()`` are disabled
163908389d88SDaniel Borkmann= =============================================================
16405d8e5aeeSStephen Kitt
16419fc9e278SKees Cook
16429fc9e278SKees Cookwarn_limit
16439fc9e278SKees Cook==========
16449fc9e278SKees Cook
16459fc9e278SKees CookNumber of kernel warnings after which the kernel should panic when
16469fc9e278SKees Cook``panic_on_warn`` is not set. Setting this to 0 disables checking
16479fc9e278SKees Cookthe warning count. Setting this to 1 has the same effect as setting
16489fc9e278SKees Cook``panic_on_warn=1``. The default value is 0.
16499fc9e278SKees Cook
16509fc9e278SKees Cook
1651a3cb66a5SStephen Kittwatchdog
1652a3cb66a5SStephen Kitt========
165357043247SMauro Carvalho Chehab
165457043247SMauro Carvalho ChehabThis parameter can be used to disable or enable the soft lockup detector
1655a3cb66a5SStephen Kitt*and* the NMI watchdog (i.e. the hard lockup detector) at the same time.
165657043247SMauro Carvalho Chehab
1657a3cb66a5SStephen Kitt= ==============================
1658a3cb66a5SStephen Kitt0 Disable both lockup detectors.
1659a3cb66a5SStephen Kitt1 Enable both lockup detectors.
1660a3cb66a5SStephen Kitt= ==============================
166157043247SMauro Carvalho Chehab
166257043247SMauro Carvalho ChehabThe soft lockup detector and the NMI watchdog can also be disabled or
1663a3cb66a5SStephen Kittenabled individually, using the ``soft_watchdog`` and ``nmi_watchdog``
1664a3cb66a5SStephen Kittparameters.
1665a3cb66a5SStephen KittIf the ``watchdog`` parameter is read, for example by executing::
166657043247SMauro Carvalho Chehab
166757043247SMauro Carvalho Chehab   cat /proc/sys/kernel/watchdog
166857043247SMauro Carvalho Chehab
1669a3cb66a5SStephen Kittthe output of this command (0 or 1) shows the logical OR of
1670a3cb66a5SStephen Kitt``soft_watchdog`` and ``nmi_watchdog``.
167157043247SMauro Carvalho Chehab
167257043247SMauro Carvalho Chehab
1673a3cb66a5SStephen Kittwatchdog_cpumask
1674a3cb66a5SStephen Kitt================
167557043247SMauro Carvalho Chehab
167657043247SMauro Carvalho ChehabThis value can be used to control on which cpus the watchdog may run.
1677a3cb66a5SStephen KittThe default cpumask is all possible cores, but if ``NO_HZ_FULL`` is
167857043247SMauro Carvalho Chehabenabled in the kernel config, and cores are specified with the
1679a3cb66a5SStephen Kitt``nohz_full=`` boot argument, those cores are excluded by default.
168057043247SMauro Carvalho ChehabOffline cores can be included in this mask, and if the core is later
168157043247SMauro Carvalho Chehabbrought online, the watchdog will be started based on the mask value.
168257043247SMauro Carvalho Chehab
1683a3cb66a5SStephen KittTypically this value would only be touched in the ``nohz_full`` case
168457043247SMauro Carvalho Chehabto re-enable cores that by default were not running the watchdog,
168557043247SMauro Carvalho Chehabif a kernel lockup was suspected on those cores.
168657043247SMauro Carvalho Chehab
168757043247SMauro Carvalho ChehabThe argument value is the standard cpulist format for cpumasks,
168857043247SMauro Carvalho Chehabso for example to enable the watchdog on cores 0, 2, 3, and 4 you
168957043247SMauro Carvalho Chehabmight say::
169057043247SMauro Carvalho Chehab
169157043247SMauro Carvalho Chehab  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
169257043247SMauro Carvalho Chehab
169357043247SMauro Carvalho Chehab
1694a3cb66a5SStephen Kittwatchdog_thresh
1695a3cb66a5SStephen Kitt===============
169657043247SMauro Carvalho Chehab
169757043247SMauro Carvalho ChehabThis value can be used to control the frequency of hrtimer and NMI
169857043247SMauro Carvalho Chehabevents and the soft and hard lockup thresholds. The default threshold
169957043247SMauro Carvalho Chehabis 10 seconds.
170057043247SMauro Carvalho Chehab
1701a3cb66a5SStephen KittThe softlockup threshold is (``2 * watchdog_thresh``). Setting this
170257043247SMauro Carvalho Chehabtunable to zero will disable lockup detection altogether.
1703