diff options
Diffstat (limited to 'man5/proc_sys.5')
| -rw-r--r-- | man5/proc_sys.5 | 1623 |
1 files changed, 1623 insertions, 0 deletions
diff --git a/man5/proc_sys.5 b/man5/proc_sys.5 new file mode 100644 index 0000000000..78f0c192c2 --- /dev/null +++ b/man5/proc_sys.5 @@ -0,0 +1,1623 @@ +'\" t +.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com> +.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com> +.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl> +.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org> +.\" +.\" SPDX-License-Identifier: GPL-3.0-or-later +.\" +.TH proc_sys 5 (date) "Linux man-pages (unreleased)" +.SH NAME +/proc/sys/ \- system information, and sysctl pseudo-filesystem +.SH DESCRIPTION +.TP +.I /proc/sys/ +This directory (present since Linux 1.3.57) contains a number of files +and subdirectories corresponding to kernel variables. +These variables can be read and in some cases modified using +the \fI/proc\fP filesystem, and the (deprecated) +.BR sysctl (2) +system call. +.IP +String values may be terminated by either \[aq]\e0\[aq] or \[aq]\en\[aq]. +.IP +Integer and long values may be written either in decimal or in +hexadecimal notation (e.g., 0x3FFF). +When writing multiple integer or long values, these may be separated +by any of the following whitespace characters: +\[aq]\ \[aq], \[aq]\et\[aq], or \[aq]\en\[aq]. +Using other separators leads to the error +.BR EINVAL . +.TP +.IR /proc/sys/abi/ " (since Linux 2.4.10)" +This directory may contain files with application binary information. +.\" On some systems, it is not present. +See the Linux kernel source file +.I Documentation/sysctl/abi.rst +(or +.I Documentation/sysctl/abi.txt +before Linux 5.3) +for more information. +.TP +.I /proc/sys/debug/ +This directory may be empty. +.TP +.I /proc/sys/dev/ +This directory contains device-specific information (e.g., +.IR dev/cdrom/info ). +On +some systems, it may be empty. +.TP +.I /proc/sys/fs/ +This directory contains the files and subdirectories for kernel variables +related to filesystems. +.TP +.IR /proc/sys/fs/aio\-max\-nr " and " /proc/sys/fs/aio\-nr " (since Linux 2.6.4)" +.I aio\-nr +is the running total of the number of events specified by +.BR io_setup (2) +calls for all currently active AIO contexts. +If +.I aio\-nr +reaches +.IR aio\-max\-nr , +then +.BR io_setup (2) +will fail with the error +.BR EAGAIN . +Raising +.I aio\-max\-nr +does not result in the preallocation or resizing +of any kernel data structures. +.TP +.I /proc/sys/fs/binfmt_misc +Documentation for files in this directory can be found +in the Linux kernel source in the file +.I Documentation/admin\-guide/binfmt\-misc.rst +(or in +.I Documentation/binfmt_misc.txt +on older kernels). +.TP +.IR /proc/sys/fs/dentry\-state " (since Linux 2.2)" +This file contains information about the status of the +directory cache (dcache). +The file contains six numbers, +.IR nr_dentry , +.IR nr_unused , +.I age_limit +(age in seconds), +.I want_pages +(pages requested by system) and two dummy values. +.RS +.IP \[bu] 3 +.I nr_dentry +is the number of allocated dentries (dcache entries). +This field is unused in Linux 2.2. +.IP \[bu] +.I nr_unused +is the number of unused dentries. +.IP \[bu] +.I age_limit +.\" looks like this is unused in Linux 2.2 to Linux 2.6 +is the age in seconds after which dcache entries +can be reclaimed when memory is short. +.IP \[bu] +.I want_pages +.\" looks like this is unused in Linux 2.2 to Linux 2.6 +is nonzero when the kernel has called shrink_dcache_pages() and the +dcache isn't pruned yet. +.RE +.TP +.I /proc/sys/fs/dir\-notify\-enable +This file can be used to disable or enable the +.I dnotify +interface described in +.BR fcntl (2) +on a system-wide basis. +A value of 0 in this file disables the interface, +and a value of 1 enables it. +.TP +.I /proc/sys/fs/dquot\-max +This file shows the maximum number of cached disk quota entries. +On some (2.4) systems, it is not present. +If the number of free cached disk quota entries is very low and +you have some awesome number of simultaneous system users, +you might want to raise the limit. +.TP +.I /proc/sys/fs/dquot\-nr +This file shows the number of allocated disk quota +entries and the number of free disk quota entries. +.TP +.IR /proc/sys/fs/epoll/ " (since Linux 2.6.28)" +This directory contains the file +.IR max_user_watches , +which can be used to limit the amount of kernel memory consumed by the +.I epoll +interface. +For further details, see +.BR epoll (7). +.TP +.I /proc/sys/fs/file\-max +This file defines +a system-wide limit on the number of open files for all processes. +System calls that fail when encountering this limit fail with the error +.BR ENFILE . +(See also +.BR setrlimit (2), +which can be used by a process to set the per-process limit, +.BR RLIMIT_NOFILE , +on the number of files it may open.) +If you get lots +of error messages in the kernel log about running out of file handles +(open file descriptions) +(look for "VFS: file\-max limit <number> reached"), +try increasing this value: +.IP +.in +4n +.EX +echo 100000 > /proc/sys/fs/file\-max +.EE +.in +.IP +Privileged processes +.RB ( CAP_SYS_ADMIN ) +can override the +.I file\-max +limit. +.TP +.I /proc/sys/fs/file\-nr +This (read-only) file contains three numbers: +the number of allocated file handles +(i.e., the number of open file descriptions; see +.BR open (2)); +the number of free file handles; +and the maximum number of file handles (i.e., the same value as +.IR /proc/sys/fs/file\-max ). +If the number of allocated file handles is close to the +maximum, you should consider increasing the maximum. +Before Linux 2.6, +the kernel allocated file handles dynamically, +but it didn't free them again. +Instead the free file handles were kept in a list for reallocation; +the "free file handles" value indicates the size of that list. +A large number of free file handles indicates that there was +a past peak in the usage of open file handles. +Since Linux 2.6, the kernel does deallocate freed file handles, +and the "free file handles" value is always zero. +.TP +.IR /proc/sys/fs/inode\-max " (only present until Linux 2.2)" +This file contains the maximum number of in-memory inodes. +This value should be 3\[en]4 times larger +than the value in +.IR file\-max , +since \fIstdin\fP, \fIstdout\fP +and network sockets also need an inode to handle them. +When you regularly run out of inodes, you need to increase this value. +.IP +Starting with Linux 2.4, +there is no longer a static limit on the number of inodes, +and this file is removed. +.TP +.I /proc/sys/fs/inode\-nr +This file contains the first two values from +.IR inode\-state . +.TP +.I /proc/sys/fs/inode\-state +This file +contains seven numbers: +.IR nr_inodes , +.IR nr_free_inodes , +.IR preshrink , +and four dummy values (always zero). +.IP +.I nr_inodes +is the number of inodes the system has allocated. +.\" This can be slightly more than +.\" .I inode\-max +.\" because Linux allocates them one page full at a time. +.I nr_free_inodes +represents the number of free inodes. +.IP +.I preshrink +is nonzero when the +.I nr_inodes +> +.I inode\-max +and the system needs to prune the inode list instead of allocating more; +since Linux 2.4, this field is a dummy value (always zero). +.TP +.IR /proc/sys/fs/inotify/ " (since Linux 2.6.13)" +This directory contains files +.IR max_queued_events ", " max_user_instances ", and " max_user_watches , +that can be used to limit the amount of kernel memory consumed by the +.I inotify +interface. +For further details, see +.BR inotify (7). +.TP +.I /proc/sys/fs/lease\-break\-time +This file specifies the grace period that the kernel grants to a process +holding a file lease +.RB ( fcntl (2)) +after it has sent a signal to that process notifying it +that another process is waiting to open the file. +If the lease holder does not remove or downgrade the lease within +this grace period, the kernel forcibly breaks the lease. +.TP +.I /proc/sys/fs/leases\-enable +This file can be used to enable or disable file leases +.RB ( fcntl (2)) +on a system-wide basis. +If this file contains the value 0, leases are disabled. +A nonzero value enables leases. +.TP +.IR /proc/sys/fs/mount\-max " (since Linux 4.9)" +.\" commit d29216842a85c7970c536108e093963f02714498 +The value in this file specifies the maximum number of mounts that may exist +in a mount namespace. +The default value in this file is 100,000. +.TP +.IR /proc/sys/fs/mqueue/ " (since Linux 2.6.6)" +This directory contains files +.IR msg_max ", " msgsize_max ", and " queues_max , +controlling the resources used by POSIX message queues. +See +.BR mq_overview (7) +for details. +.TP +.IR /proc/sys/fs/nr_open " (since Linux 2.6.25)" +.\" commit 9cfe015aa424b3c003baba3841a60dd9b5ad319b +This file imposes a ceiling on the value to which the +.B RLIMIT_NOFILE +resource limit can be raised (see +.BR getrlimit (2)). +This ceiling is enforced for both unprivileged and privileged process. +The default value in this file is 1048576. +(Before Linux 2.6.25, the ceiling for +.B RLIMIT_NOFILE +was hard-coded to the same value.) +.TP +.IR /proc/sys/fs/overflowgid " and " /proc/sys/fs/overflowuid +These files +allow you to change the value of the fixed UID and GID. +The default is 65534. +Some filesystems support only 16-bit UIDs and GIDs, although in Linux +UIDs and GIDs are 32 bits. +When one of these filesystems is mounted +with writes enabled, any UID or GID that would exceed 65535 is translated +to the overflow value before being written to disk. +.TP +.IR /proc/sys/fs/pipe\-max\-size " (since Linux 2.6.35)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/pipe\-user\-pages\-hard " (since Linux 4.5)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/pipe\-user\-pages\-soft " (since Linux 4.5)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/protected_fifos " (since Linux 4.19)" +The value in this file is/can be set to one of the following: +.RS +.TP 4 +0 +Writing to FIFOs is unrestricted. +.TP +1 +Don't allow +.B O_CREAT +.BR open (2) +on FIFOs that the caller doesn't own in world-writable sticky directories, +unless the FIFO is owned by the owner of the directory. +.TP +2 +As for the value 1, +but the restriction also applies to group-writable sticky directories. +.RE +.IP +The intent of the above protections is to avoid unintentional writes to an +attacker-controlled FIFO when a program expected to create a regular file. +.TP +.IR /proc/sys/fs/protected_hardlinks " (since Linux 3.6)" +.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 +When the value in this file is 0, +no restrictions are placed on the creation of hard links +(i.e., this is the historical behavior before Linux 3.6). +When the value in this file is 1, +a hard link can be created to a target file +only if one of the following conditions is true: +.RS +.IP \[bu] 3 +The calling process has the +.B CAP_FOWNER +capability in its user namespace +and the file UID has a mapping in the namespace. +.IP \[bu] +The filesystem UID of the process creating the link matches +the owner (UID) of the target file +(as described in +.BR credentials (7), +a process's filesystem UID is normally the same as its effective UID). +.IP \[bu] +All of the following conditions are true: +.RS 4 +.IP \[bu] 3 +the target is a regular file; +.IP \[bu] +the target file does not have its set-user-ID mode bit enabled; +.IP \[bu] +the target file does not have both its set-group-ID and +group-executable mode bits enabled; and +.IP \[bu] +the caller has permission to read and write the target file +(either via the file's permissions mask or because it has +suitable capabilities). +.RE +.RE +.IP +The default value in this file is 0. +Setting the value to 1 +prevents a longstanding class of security issues caused by +hard-link-based time-of-check, time-of-use races, +most commonly seen in world-writable directories such as +.IR /tmp . +The common method of exploiting this flaw +is to cross privilege boundaries when following a given hard link +(i.e., a root process follows a hard link created by another user). +Additionally, on systems without separated partitions, +this stops unauthorized users from "pinning" vulnerable set-user-ID and +set-group-ID files against being upgraded by +the administrator, or linking to special files. +.TP +.IR /proc/sys/fs/protected_regular " (since Linux 4.19)" +The value in this file is/can be set to one of the following: +.RS +.TP 4 +0 +Writing to regular files is unrestricted. +.TP +1 +Don't allow +.B O_CREAT +.BR open (2) +on regular files that the caller doesn't own in +world-writable sticky directories, +unless the regular file is owned by the owner of the directory. +.TP +2 +As for the value 1, +but the restriction also applies to group-writable sticky directories. +.RE +.IP +The intent of the above protections is similar to +.IR protected_fifos , +but allows an application to +avoid writes to an attacker-controlled regular file, +where the application expected to create one. +.TP +.IR /proc/sys/fs/protected_symlinks " (since Linux 3.6)" +.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 +When the value in this file is 0, +no restrictions are placed on following symbolic links +(i.e., this is the historical behavior before Linux 3.6). +When the value in this file is 1, symbolic links are followed only +in the following circumstances: +.RS +.IP \[bu] 3 +the filesystem UID of the process following the link matches +the owner (UID) of the symbolic link +(as described in +.BR credentials (7), +a process's filesystem UID is normally the same as its effective UID); +.IP \[bu] +the link is not in a sticky world-writable directory; or +.IP \[bu] +the symbolic link and its parent directory have the same owner (UID) +.RE +.IP +A system call that fails to follow a symbolic link +because of the above restrictions returns the error +.B EACCES +in +.IR errno . +.IP +The default value in this file is 0. +Setting the value to 1 avoids a longstanding class of security issues +based on time-of-check, time-of-use races when accessing symbolic links. +.TP +.IR /proc/sys/fs/suid_dumpable " (since Linux 2.6.13)" +.\" The following is based on text from Documentation/sysctl/kernel.txt +The value in this file is assigned to a process's "dumpable" flag +in the circumstances described in +.BR prctl (2). +In effect, +the value in this file determines whether core dump files are +produced for set-user-ID or otherwise protected/tainted binaries. +The "dumpable" setting also affects the ownership of files in a process's +.IR /proc/ pid +directory, as described above. +.IP +Three different integer values can be specified: +.RS +.TP +\fI0\ (default)\fP +.\" In kernel source: SUID_DUMP_DISABLE +This provides the traditional (pre-Linux 2.6.13) behavior. +A core dump will not be produced for a process which has +changed credentials (by calling +.BR seteuid (2), +.BR setgid (2), +or similar, or by executing a set-user-ID or set-group-ID program) +or whose binary does not have read permission enabled. +.TP +\fI1\ ("debug")\fP +.\" In kernel source: SUID_DUMP_USER +All processes dump core when possible. +(Reasons why a process might nevertheless not dump core are described in +.BR core (5).) +The core dump is owned by the filesystem user ID of the dumping process +and no security is applied. +This is intended for system debugging situations only: +this mode is insecure because it allows unprivileged users to +examine the memory contents of privileged processes. +.TP +\fI2\ ("suidsafe")\fP +.\" In kernel source: SUID_DUMP_ROOT +Any binary which normally would not be dumped (see "0" above) +is dumped readable by root only. +This allows the user to remove the core dump file but not to read it. +For security reasons core dumps in this mode will not overwrite one +another or other files. +This mode is appropriate when administrators are +attempting to debug problems in a normal environment. +.IP +Additionally, since Linux 3.6, +.\" 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 +.I /proc/sys/kernel/core_pattern +must either be an absolute pathname +or a pipe command, as detailed in +.BR core (5). +Warnings will be written to the kernel log if +.I core_pattern +does not follow these rules, and no core dump will be produced. +.\" 54b501992dd2a839e94e76aa392c392b55080ce8 +.RE +.IP +For details of the effect of a process's "dumpable" setting +on ptrace access mode checking, see +.BR ptrace (2). +.TP +.I /proc/sys/fs/super\-max +This file +controls the maximum number of superblocks, and +thus the maximum number of mounted filesystems the kernel +can have. +You need increase only +.I super\-max +if you need to mount more filesystems than the current value in +.I super\-max +allows you to. +.TP +.I /proc/sys/fs/super\-nr +This file +contains the number of filesystems currently mounted. +.TP +.I /proc/sys/kernel/ +This directory contains files controlling a range of kernel parameters, +as described below. +.TP +.I /proc/sys/kernel/acct +This file +contains three numbers: +.IR highwater , +.IR lowwater , +and +.IR frequency . +If BSD-style process accounting is enabled, these values control +its behavior. +If free space on filesystem where the log lives goes below +.I lowwater +percent, accounting suspends. +If free space gets above +.I highwater +percent, accounting resumes. +.I frequency +determines +how often the kernel checks the amount of free space (value is in +seconds). +Default values are 4, 2, and 30. +That is, suspend accounting if 2% or less space is free; resume it +if 4% or more space is free; consider information about amount of free space +valid for 30 seconds. +.TP +.IR /proc/sys/kernel/auto_msgmni " (Linux 2.6.27 to Linux 3.18)" +.\" commit 9eefe520c814f6f62c5d36a2ddcd3fb99dfdb30e (introduces feature) +.\" commit 0050ee059f7fc86b1df2527aaa14ed5dc72f9973 (rendered redundant) +From Linux 2.6.27 to Linux 3.18, +this file was used to control recomputing of the value in +.I /proc/sys/kernel/msgmni +upon the addition or removal of memory or upon IPC namespace creation/removal. +Echoing "1" into this file enabled +.I msgmni +automatic recomputing (and triggered a recomputation of +.I msgmni +based on the current amount of available memory and number of IPC namespaces). +Echoing "0" disabled automatic recomputing. +(Automatic recomputing was also disabled if a value was explicitly assigned to +.IR /proc/sys/kernel/msgmni .) +The default value in +.I auto_msgmni +was 1. +.IP +Since Linux 3.19, the content of this file has no effect (because +.I msgmni +.\" FIXME Must document the 3.19 'msgmni' changes. +defaults to near the maximum value possible), +and reads from this file always return the value "0". +.TP +.IR /proc/sys/kernel/cap_last_cap " (since Linux 3.2)" +See +.BR capabilities (7). +.TP +.IR /proc/sys/kernel/cap\-bound " (from Linux 2.2 to Linux 2.6.24)" +This file holds the value of the kernel +.I "capability bounding set" +(expressed as a signed decimal number). +This set is ANDed against the capabilities permitted to a process +during +.BR execve (2). +Starting with Linux 2.6.25, +the system-wide capability bounding set disappeared, +and was replaced by a per-thread bounding set; see +.BR capabilities (7). +.TP +.I /proc/sys/kernel/core_pattern +See +.BR core (5). +.TP +.I /proc/sys/kernel/core_pipe_limit +See +.BR core (5). +.TP +.I /proc/sys/kernel/core_uses_pid +See +.BR core (5). +.TP +.I /proc/sys/kernel/ctrl\-alt\-del +This file +controls the handling of Ctrl-Alt-Del from the keyboard. +When the value in this file is 0, Ctrl-Alt-Del is trapped and +sent to the +.BR init (1) +program to handle a graceful restart. +When the value is greater than zero, Linux's reaction to a Vulcan +Nerve Pinch (tm) will be an immediate reboot, without even +syncing its dirty buffers. +Note: when a program (like dosemu) has the keyboard in "raw" +mode, the Ctrl-Alt-Del is intercepted by the program before it +ever reaches the kernel tty layer, and it's up to the program +to decide what to do with it. +.TP +.IR /proc/sys/kernel/dmesg_restrict " (since Linux 2.6.37)" +The value in this file determines who can see kernel syslog contents. +A value of 0 in this file imposes no restrictions. +If the value is 1, only privileged users can read the kernel syslog. +(See +.BR syslog (2) +for more details.) +Since Linux 3.4, +.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 +only users with the +.B CAP_SYS_ADMIN +capability may change the value in this file. +.TP +.IR /proc/sys/kernel/domainname " and " /proc/sys/kernel/hostname +can be used to set the NIS/YP domainname and the +hostname of your box in exactly the same way as the commands +.BR domainname (1) +and +.BR hostname (1), +that is: +.IP +.in +4n +.EX +.RB "#" " echo \[aq]darkstar\[aq] > /proc/sys/kernel/hostname" +.RB "#" " echo \[aq]mydomain\[aq] > /proc/sys/kernel/domainname" +.EE +.in +.IP +has the same effect as +.IP +.in +4n +.EX +.RB "#" " hostname \[aq]darkstar\[aq]" +.RB "#" " domainname \[aq]mydomain\[aq]" +.EE +.in +.IP +Note, however, that the classic darkstar.frop.org has the +hostname "darkstar" and DNS (Internet Domain Name Server) +domainname "frop.org", not to be confused with the NIS (Network +Information Service) or YP (Yellow Pages) domainname. +These two +domain names are in general different. +For a detailed discussion +see the +.BR hostname (1) +man page. +.TP +.I /proc/sys/kernel/hotplug +This file +contains the pathname for the hotplug policy agent. +The default value in this file is +.IR /sbin/hotplug . +.TP +.\" Removed in commit 87f504e5c78b910b0c1d6ffb89bc95e492322c84 (tglx/history.git) +.IR /proc/sys/kernel/htab\-reclaim " (before Linux 2.4.9.2)" +(PowerPC only) If this file is set to a nonzero value, +the PowerPC htab +.\" removed in commit 1b483a6a7b2998e9c98ad985d7494b9b725bd228, before Linux 2.6.28 +(see kernel file +.IR Documentation/powerpc/ppc_htab.txt ) +is pruned +each time the system hits the idle loop. +.TP +.I /proc/sys/kernel/keys/ +This directory contains various files that define parameters and limits +for the key-management facility. +These files are described in +.BR keyrings (7). +.TP +.IR /proc/sys/kernel/kptr_restrict " (since Linux 2.6.38)" +.\" 455cd5ab305c90ffc422dd2e0fb634730942b257 +The value in this file determines whether kernel addresses are exposed via +.I /proc +files and other interfaces. +A value of 0 in this file imposes no restrictions. +If the value is 1, kernel pointers printed using the +.I %pK +format specifier will be replaced with zeros unless the user has the +.B CAP_SYSLOG +capability. +If the value is 2, kernel pointers printed using the +.I %pK +format specifier will be replaced with zeros regardless +of the user's capabilities. +The initial default value for this file was 1, +but the default was changed +.\" commit 411f05f123cbd7f8aa1edcae86970755a6e2a9d9 +to 0 in Linux 2.6.39. +Since Linux 3.4, +.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 +only users with the +.B CAP_SYS_ADMIN +capability can change the value in this file. +.TP +.I /proc/sys/kernel/l2cr +(PowerPC only) This file +contains a flag that controls the L2 cache of G3 processor +boards. +If 0, the cache is disabled. +Enabled if nonzero. +.TP +.I /proc/sys/kernel/modprobe +This file contains the pathname for the kernel module loader. +The default value is +.IR /sbin/modprobe . +The file is present only if the kernel is built with the +.B CONFIG_MODULES +.RB ( CONFIG_KMOD +in Linux 2.6.26 and earlier) +option enabled. +It is described by the Linux kernel source file +.I Documentation/kmod.txt +(present only in Linux 2.4 and earlier). +.TP +.IR /proc/sys/kernel/modules_disabled " (since Linux 2.6.31)" +.\" 3d43321b7015387cfebbe26436d0e9d299162ea1 +.\" From Documentation/sysctl/kernel.txt +A toggle value indicating if modules are allowed to be loaded +in an otherwise modular kernel. +This toggle defaults to off (0), but can be set true (1). +Once true, modules can be neither loaded nor unloaded, +and the toggle cannot be set back to false. +The file is present only if the kernel is built with the +.B CONFIG_MODULES +option enabled. +.TP +.IR /proc/sys/kernel/msgmax " (since Linux 2.2)" +This file defines +a system-wide limit specifying the maximum number of bytes in +a single message written on a System V message queue. +.TP +.IR /proc/sys/kernel/msgmni " (since Linux 2.4)" +This file defines the system-wide limit on the number of +message queue identifiers. +See also +.IR /proc/sys/kernel/auto_msgmni . +.TP +.IR /proc/sys/kernel/msgmnb " (since Linux 2.2)" +This file defines a system-wide parameter used to initialize the +.I msg_qbytes +setting for subsequently created message queues. +The +.I msg_qbytes +setting specifies the maximum number of bytes that may be written to the +message queue. +.TP +.IR /proc/sys/kernel/ngroups_max " (since Linux 2.6.4)" +This is a read-only file that displays the upper limit on the +number of a process's group memberships. +.TP +.IR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)" +See +.BR pid_namespaces (7). +.TP +.IR /proc/sys/kernel/ostype " and " /proc/sys/kernel/osrelease +These files +give substrings of +.IR /proc/version . +.TP +.IR /proc/sys/kernel/overflowgid " and " /proc/sys/kernel/overflowuid +These files duplicate the files +.I /proc/sys/fs/overflowgid +and +.IR /proc/sys/fs/overflowuid . +.TP +.I /proc/sys/kernel/panic +This file gives read/write access to the kernel variable +.IR panic_timeout . +If this is zero, the kernel will loop on a panic; if nonzero, +it indicates that the kernel should autoreboot after this number +of seconds. +When you use the +software watchdog device driver, the recommended setting is 60. +.TP +.IR /proc/sys/kernel/panic_on_oops " (since Linux 2.5.68)" +This file controls the kernel's behavior when an oops +or BUG is encountered. +If this file contains 0, then the system +tries to continue operation. +If it contains 1, then the system +delays a few seconds (to give klogd time to record the oops output) +and then panics. +If the +.I /proc/sys/kernel/panic +file is also nonzero, then the machine will be rebooted. +.TP +.IR /proc/sys/kernel/pid_max " (since Linux 2.5.34)" +This file specifies the value at which PIDs wrap around +(i.e., the value in this file is one greater than the maximum PID). +PIDs greater than this value are not allocated; +thus, the value in this file also acts as a system-wide limit +on the total number of processes and threads. +The default value for this file, 32768, +results in the same range of PIDs as on earlier kernels. +On 32-bit platforms, 32768 is the maximum value for +.IR pid_max . +On 64-bit systems, +.I pid_max +can be set to any value up to 2\[ha]22 +.RB ( PID_MAX_LIMIT , +approximately 4 million). +.\" Prior to Linux 2.6.10, pid_max could also be raised above 32768 on 32-bit +.\" platforms, but this broke /proc/[pid] +.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=109513010926152&w=2 +.TP +.IR /proc/sys/kernel/powersave\-nap " (PowerPC only)" +This file contains a flag. +If set, Linux-PPC will use the "nap" mode of +powersaving, +otherwise the "doze" mode will be used. +.TP +.I /proc/sys/kernel/printk +See +.BR syslog (2). +.TP +.IR /proc/sys/kernel/pty " (since Linux 2.6.4)" +This directory contains two files relating to the number of UNIX 98 +pseudoterminals (see +.BR pts (4)) +on the system. +.TP +.I /proc/sys/kernel/pty/max +This file defines the maximum number of pseudoterminals. +.\" FIXME Document /proc/sys/kernel/pty/reserve +.\" New in Linux 3.3 +.\" commit e9aba5158a80098447ff207a452a3418ae7ee386 +.TP +.I /proc/sys/kernel/pty/nr +This read-only file +indicates how many pseudoterminals are currently in use. +.TP +.I /proc/sys/kernel/random/ +This directory +contains various parameters controlling the operation of the file +.IR /dev/random . +See +.BR random (4) +for further information. +.TP +.IR /proc/sys/kernel/random/uuid " (since Linux 2.4)" +Each read from this read-only file returns a randomly generated 128-bit UUID, +as a string in the standard UUID format. +.TP +.IR /proc/sys/kernel/randomize_va_space " (since Linux 2.6.12)" +.\" Some further details can be found in Documentation/sysctl/kernel.txt +Select the address space layout randomization (ASLR) policy for the system +(on architectures that support ASLR). +Three values are supported for this file: +.RS +.TP +.B 0 +Turn ASLR off. +This is the default for architectures that don't support ASLR, +and when the kernel is booted with the +.I norandmaps +parameter. +.TP +.B 1 +Make the addresses of +.BR mmap (2) +allocations, the stack, and the VDSO page randomized. +Among other things, this means that shared libraries will be +loaded at randomized addresses. +The text segment of PIE-linked binaries will also be loaded +at a randomized address. +This value is the default if the kernel was configured with +.BR CONFIG_COMPAT_BRK . +.TP +.B 2 +(Since Linux 2.6.25) +.\" commit c1d171a002942ea2d93b4fbd0c9583c56fce0772 +Also support heap randomization. +This value is the default if the kernel was not configured with +.BR CONFIG_COMPAT_BRK . +.RE +.TP +.I /proc/sys/kernel/real\-root\-dev +This file is documented in the Linux kernel source file +.I Documentation/admin\-guide/initrd.rst +.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 +(or +.I Documentation/initrd.txt +before Linux 4.10). +.TP +.IR /proc/sys/kernel/reboot\-cmd " (Sparc only)" +This file seems to be a way to give an argument to the SPARC +ROM/Flash boot loader. +Maybe to tell it what to do after +rebooting? +.TP +.I /proc/sys/kernel/rtsig\-max +(Up to and including Linux 2.6.7; see +.BR setrlimit (2)) +This file can be used to tune the maximum number +of POSIX real-time (queued) signals that can be outstanding +in the system. +.TP +.I /proc/sys/kernel/rtsig\-nr +(Up to and including Linux 2.6.7.) +This file shows the number of POSIX real-time signals currently queued. +.TP +.IR /proc/ pid /sched_autogroup_enabled " (since Linux 2.6.38)" +.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/sched_child_runs_first " (since Linux 2.6.23)" +If this file contains the value zero, then, after a +.BR fork (2), +the parent is first scheduled on the CPU. +If the file contains a nonzero value, +then the child is scheduled first on the CPU. +(Of course, on a multiprocessor system, +the parent and the child might both immediately be scheduled on a CPU.) +.TP +.IR /proc/sys/kernel/sched_rr_timeslice_ms " (since Linux 3.9)" +See +.BR sched_rr_get_interval (2). +.TP +.IR /proc/sys/kernel/sched_rt_period_us " (since Linux 2.6.25)" +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/sched_rt_runtime_us " (since Linux 2.6.25)" +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/seccomp/ " (since Linux 4.14)" +.\" commit 8e5f1ad116df6b0de65eac458d5e7c318d1c05af +This directory provides additional seccomp information and +configuration. +See +.BR seccomp (2) +for further details. +.TP +.IR /proc/sys/kernel/sem " (since Linux 2.4)" +This file contains 4 numbers defining limits for System V IPC semaphores. +These fields are, in order: +.RS +.TP +SEMMSL +The maximum semaphores per semaphore set. +.TP +SEMMNS +A system-wide limit on the number of semaphores in all semaphore sets. +.TP +SEMOPM +The maximum number of operations that may be specified in a +.BR semop (2) +call. +.TP +SEMMNI +A system-wide limit on the maximum number of semaphore identifiers. +.RE +.TP +.I /proc/sys/kernel/sg\-big\-buff +This file +shows the size of the generic SCSI device (sg) buffer. +You can't tune it just yet, but you could change it at +compile time by editing +.I include/scsi/sg.h +and changing +the value of +.BR SG_BIG_BUFF . +However, there shouldn't be any reason to change this value. +.TP +.IR /proc/sys/kernel/shm_rmid_forced " (since Linux 3.1)" +.\" commit b34a6b1da371ed8af1221459a18c67970f7e3d53 +.\" See also Documentation/sysctl/kernel.txt +If this file is set to 1, all System V shared memory segments will +be marked for destruction as soon as the number of attached processes +falls to zero; +in other words, it is no longer possible to create shared memory segments +that exist independently of any attached process. +.IP +The effect is as though a +.BR shmctl (2) +.B IPC_RMID +is performed on all existing segments as well as all segments +created in the future (until this file is reset to 0). +Note that existing segments that are attached to no process will be +immediately destroyed when this file is set to 1. +Setting this option will also destroy segments that were created, +but never attached, +upon termination of the process that created the segment with +.BR shmget (2). +.IP +Setting this file to 1 provides a way of ensuring that +all System V shared memory segments are counted against the +resource usage and resource limits (see the description of +.B RLIMIT_AS +in +.BR getrlimit (2)) +of at least one process. +.IP +Because setting this file to 1 produces behavior that is nonstandard +and could also break existing applications, +the default value in this file is 0. +Set this file to 1 only if you have a good understanding +of the semantics of the applications using +System V shared memory on your system. +.TP +.IR /proc/sys/kernel/shmall " (since Linux 2.2)" +This file +contains the system-wide limit on the total number of pages of +System V shared memory. +.TP +.IR /proc/sys/kernel/shmmax " (since Linux 2.2)" +This file +can be used to query and set the run-time limit +on the maximum (System V IPC) shared memory segment size that can be +created. +Shared memory segments up to 1 GB are now supported in the +kernel. +This value defaults to +.BR SHMMAX . +.TP +.IR /proc/sys/kernel/shmmni " (since Linux 2.4)" +This file +specifies the system-wide maximum number of System V shared memory +segments that can be created. +.TP +.IR /proc/sys/kernel/sysctl_writes_strict " (since Linux 3.16)" +.\" commit f88083005ab319abba5d0b2e4e997558245493c8 +.\" commit 2ca9bb456ada8bcbdc8f77f8fc78207653bbaa92 +.\" commit f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61 +.\" commit 24fe831c17ab8149413874f2fd4e5c8a41fcd294 +The value in this file determines how the file offset affects +the behavior of updating entries in files under +.IR /proc/sys . +The file has three possible values: +.RS +.TP 4 +\-1 +This provides legacy handling, with no printk warnings. +Each +.BR write (2) +must fully contain the value to be written, +and multiple writes on the same file descriptor +will overwrite the entire value, regardless of the file position. +.TP +0 +(default) This provides the same behavior as for \-1, +but printk warnings are written for processes that +perform writes when the file offset is not 0. +.TP +1 +Respect the file offset when writing strings into +.I /proc/sys +files. +Multiple writes will +.I append +to the value buffer. +Anything written beyond the maximum length +of the value buffer will be ignored. +Writes to numeric +.I /proc/sys +entries must always be at file offset 0 and the value must be +fully contained in the buffer provided to +.BR write (2). +.\" FIXME . +.\" With /proc/sys/kernel/sysctl_writes_strict==1, writes at an +.\" offset other than 0 do not generate an error. Instead, the +.\" write() succeeds, but the file is left unmodified. +.\" This is surprising. The behavior may change in the future. +.\" See thread.gmane.org/gmane.linux.man/9197 +.\" From: Michael Kerrisk (man-pages <mtk.manpages@...> +.\" Subject: sysctl_writes_strict documentation + an oddity? +.\" Newsgroups: gmane.linux.man, gmane.linux.kernel +.\" Date: 2015-05-09 08:54:11 GMT +.RE +.TP +.I /proc/sys/kernel/sysrq +This file controls the functions allowed to be invoked by the SysRq key. +By default, +the file contains 1 meaning that every possible SysRq request is allowed +(in older kernel versions, SysRq was disabled by default, +and you were required to specifically enable it at run-time, +but this is not the case any more). +Possible values in this file are: +.RS +.TP 5 +0 +Disable sysrq completely +.TP +1 +Enable all functions of sysrq +.TP +> 1 +Bit mask of allowed sysrq functions, as follows: +.PD 0 +.RS +.TP 5 +\ \ 2 +Enable control of console logging level +.TP +\ \ 4 +Enable control of keyboard (SAK, unraw) +.TP +\ \ 8 +Enable debugging dumps of processes etc. +.TP +\ 16 +Enable sync command +.TP +\ 32 +Enable remount read-only +.TP +\ 64 +Enable signaling of processes (term, kill, oom-kill) +.TP +128 +Allow reboot/poweroff +.TP +256 +Allow nicing of all real-time tasks +.RE +.PD +.RE +.IP +This file is present only if the +.B CONFIG_MAGIC_SYSRQ +kernel configuration option is enabled. +For further details see the Linux kernel source file +.I Documentation/admin\-guide/sysrq.rst +.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 +(or +.I Documentation/sysrq.txt +before Linux 4.10). +.TP +.I /proc/sys/kernel/version +This file contains a string such as: +.IP +.in +4n +.EX +#5 Wed Feb 25 21:49:24 MET 1998 +.EE +.in +.IP +The "#5" means that +this is the fifth kernel built from this source base and the +date following it indicates the time the kernel was built. +.TP +.IR /proc/sys/kernel/threads\-max " (since Linux 2.3.11)" +.\" The following is based on Documentation/sysctl/kernel.txt +This file specifies the system-wide limit on the number of +threads (tasks) that can be created on the system. +.IP +Since Linux 4.1, +.\" commit 230633d109e35b0a24277498e773edeb79b4a331 +the value that can be written to +.I threads\-max +is bounded. +The minimum value that can be written is 20. +The maximum value that can be written is given by the +constant +.B FUTEX_TID_MASK +(0x3fffffff). +If a value outside of this range is written to +.IR threads\-max , +the error +.B EINVAL +occurs. +.IP +The value written is checked against the available RAM pages. +If the thread structures would occupy too much (more than 1/8th) +of the available RAM pages, +.I threads\-max +is reduced accordingly. +.TP +.IR /proc/sys/kernel/yama/ptrace_scope " (since Linux 3.5)" +See +.BR ptrace (2). +.TP +.IR /proc/sys/kernel/zero\-paged " (PowerPC only)" +This file +contains a flag. +When enabled (nonzero), Linux-PPC will pre-zero pages in +the idle loop, possibly speeding up get_free_pages. +.TP +.I /proc/sys/net +This directory contains networking stuff. +Explanations for some of the files under this directory can be found in +.BR tcp (7) +and +.BR ip (7). +.TP +.I /proc/sys/net/core/bpf_jit_enable +See +.BR bpf (2). +.TP +.I /proc/sys/net/core/somaxconn +This file defines a ceiling value for the +.I backlog +argument of +.BR listen (2); +see the +.BR listen (2) +manual page for details. +.TP +.I /proc/sys/proc +This directory may be empty. +.TP +.I /proc/sys/sunrpc +This directory supports Sun remote procedure call for network filesystem +(NFS). +On some systems, it is not present. +.TP +.IR /proc/sys/user " (since Linux 4.9)" +See +.BR namespaces (7). +.TP +.I /proc/sys/vm/ +This directory contains files for memory management tuning, buffer, and +cache management. +.TP +.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)" +.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009 +This file defines the amount of free memory (in KiB) on the system that +should be reserved for users with the capability +.BR CAP_SYS_ADMIN . +.IP +The default value in this file is the minimum of [3% of free pages, 8MiB] +expressed as KiB. +The default is intended to provide enough for the superuser +to log in and kill a process, if necessary, +under the default overcommit 'guess' mode (i.e., 0 in +.IR /proc/sys/vm/overcommit_memory ). +.IP +Systems running in "overcommit never" mode (i.e., 2 in +.IR /proc/sys/vm/overcommit_memory ) +should increase the value in this file to account +for the full virtual memory size of the programs used to recover (e.g., +.BR login (1) +.BR ssh (1), +and +.BR top (1)) +Otherwise, the superuser may not be able to log in to recover the system. +For example, on x86-64 a suitable value is 131072 (128MiB reserved). +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)" +When 1 is written to this file, all zones are compacted such that free +memory is available in contiguous blocks where possible. +The effect of this action can be seen by examining +.IR /proc/buddyinfo . +.IP +Present only if the kernel was configured with +.BR CONFIG_COMPACTION . +.TP +.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)" +Writing to this file causes the kernel to drop clean caches, dentries, and +inodes from memory, causing that memory to become free. +This can be useful for memory management testing and +performing reproducible filesystem benchmarks. +Because writing to this file causes the benefits of caching to be lost, +it can degrade overall system performance. +.IP +To free pagecache, use: +.IP +.in +4n +.EX +echo 1 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free dentries and inodes, use: +.IP +.in +4n +.EX +echo 2 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free pagecache, dentries, and inodes, use: +.IP +.in +4n +.EX +echo 3 > /proc/sys/vm/drop_caches +.EE +.in +.IP +Because writing to this file is a nondestructive operation and dirty objects +are not freeable, the +user should run +.BR sync (1) +first. +.TP +.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" +This writable file contains a group ID that is allowed +to allocate memory using huge pages. +If a process has a filesystem group ID or any supplementary group ID that +matches this group ID, +then it can make huge-page allocations without holding the +.B CAP_IPC_LOCK +capability; see +.BR memfd_create (2), +.BR mmap (2), +and +.BR shmget (2). +.TP +.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" +.\" The following is from Documentation/filesystems/proc.txt +If nonzero, this disables the new 32-bit memory-mapping layout; +the kernel will use the legacy (2.4) layout for all processes. +.TP +.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Control how to kill processes when an uncorrected memory error +(typically a 2-bit error in a memory module) +that cannot be handled by the kernel +is detected in the background by hardware. +In some cases (like the page still having a valid copy on disk), +the kernel will handle the failure +transparently without affecting any applications. +But if there is no other up-to-date copy of the data, +it will kill processes to prevent any data corruptions from propagating. +.IP +The file has one of the following values: +.RS +.TP +.B 1 +Kill all processes that have the corrupted-and-not-reloadable page mapped +as soon as the corruption is detected. +Note that this is not supported for a few types of pages, +such as kernel internally +allocated data or the swap cache, but works for the majority of user pages. +.TP +.B 0 +Unmap the corrupted page from all processes and kill a process +only if it tries to access the page. +.RE +.IP +The kill is performed using a +.B SIGBUS +signal with +.I si_code +set to +.BR BUS_MCEERR_AO . +Processes can handle this if they want to; see +.BR sigaction (2) +for more details. +.IP +This feature is active only on architectures/platforms with advanced machine +check handling and depends on the hardware capabilities. +.IP +Applications can override the +.I memory_failure_early_kill +setting individually with the +.BR prctl (2) +.B PR_MCE_KILL +operation. +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Enable memory failure recovery (when supported by the platform). +.RS +.TP +.B 1 +Attempt recovery. +.TP +.B 0 +Always panic on a memory failure. +.RE +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)" +.\" The following is from Documentation/sysctl/vm.txt +Enables a system-wide task dump (excluding kernel threads) to be +produced when the kernel performs an OOM-killing. +The dump includes the following information +for each task (thread, process): +thread ID, real user ID, thread group ID (process ID), +virtual memory size, resident set size, +the CPU that the task is scheduled on, +oom_adj score (see the description of +.IR /proc/ pid /oom_adj ), +and command name. +This is helpful to determine why the OOM-killer was invoked +and to identify the rogue task that caused it. +.IP +If this contains the value zero, this information is suppressed. +On very large systems with thousands of tasks, +it may not be feasible to dump the memory state information for each one. +Such systems should not be forced to incur a performance penalty in +OOM situations when the information may not be desired. +.IP +If this is set to nonzero, this information is shown whenever the +OOM-killer actually kills a memory-hogging task. +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)" +.\" The following is from Documentation/sysctl/vm.txt +This enables or disables killing the OOM-triggering task in +out-of-memory situations. +.IP +If this is set to zero, the OOM-killer will scan through the entire +tasklist and select a task based on heuristics to kill. +This normally selects a rogue memory-hogging task that +frees up a large amount of memory when killed. +.IP +If this is set to nonzero, the OOM-killer simply kills the task that +triggered the out-of-memory condition. +This avoids a possibly expensive tasklist scan. +.IP +If +.I /proc/sys/vm/panic_on_oom +is nonzero, it takes precedence over whatever value is used in +.IR /proc/sys/vm/oom_kill_allocating_task . +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)" +.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7 +This writable file provides an alternative to +.I /proc/sys/vm/overcommit_ratio +for controlling the +.I CommitLimit +when +.I /proc/sys/vm/overcommit_memory +has the value 2. +It allows the amount of memory overcommitting to be specified as +an absolute value (in kB), +rather than as a percentage, as is done with +.IR overcommit_ratio . +This allows for finer-grained control of +.I CommitLimit +on systems with extremely large memory sizes. +.IP +Only one of +.I overcommit_kbytes +or +.I overcommit_ratio +can have an effect: +if +.I overcommit_kbytes +has a nonzero value, then it is used to calculate +.IR CommitLimit , +otherwise +.I overcommit_ratio +is used. +Writing a value to either of these files causes the +value in the other file to be set to zero. +.TP +.I /proc/sys/vm/overcommit_memory +This file contains the kernel virtual memory accounting mode. +Values are: +.RS +.IP +0: heuristic overcommit (this is the default) +.br +1: always overcommit, never check +.br +2: always check, never overcommit +.RE +.IP +In mode 0, calls of +.BR mmap (2) +with +.B MAP_NORESERVE +are not checked, and the default check is very weak, +leading to the risk of getting a process "OOM-killed". +.IP +In mode 1, the kernel pretends there is always enough memory, +until memory actually runs out. +One use case for this mode is scientific computing applications +that employ large sparse arrays. +Before Linux 2.6.0, any nonzero value implies mode 1. +.IP +In mode 2 (available since Linux 2.6), the total virtual address space +that can be allocated +.RI ( CommitLimit +in +.IR /proc/meminfo ) +is calculated as +.IP +.in +4n +.EX +CommitLimit = (total_RAM \- total_huge_TLB) * + overcommit_ratio / 100 + total_swap +.EE +.in +.IP +where: +.RS +.IP \[bu] 3 +.I total_RAM +is the total amount of RAM on the system; +.IP \[bu] +.I total_huge_TLB +is the amount of memory set aside for huge pages; +.IP \[bu] +.I overcommit_ratio +is the value in +.IR /proc/sys/vm/overcommit_ratio ; +and +.IP \[bu] +.I total_swap +is the amount of swap space. +.RE +.IP +For example, on a system with 16 GB of physical RAM, 16 GB +of swap, no space dedicated to huge pages, and an +.I overcommit_ratio +of 50, this formula yields a +.I CommitLimit +of 24 GB. +.IP +Since Linux 3.14, if the value in +.I /proc/sys/vm/overcommit_kbytes +is nonzero, then +.I CommitLimit +is instead calculated as: +.IP +.in +4n +.EX +CommitLimit = overcommit_kbytes + total_swap +.EE +.in +.IP +See also the description of +.I /proc/sys/vm/admin_reserve_kbytes +and +.IR /proc/sys/vm/user_reserve_kbytes . +.TP +.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)" +This writable file defines a percentage by which memory +can be overcommitted. +The default value in the file is 50. +See the description of +.IR /proc/sys/vm/overcommit_memory . +.TP +.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)" +.\" The following is adapted from Documentation/sysctl/vm.txt +This enables or disables a kernel panic in +an out-of-memory situation. +.IP +If this file is set to the value 0, +the kernel's OOM-killer will kill some rogue process. +Usually, the OOM-killer is able to kill a rogue process and the +system will survive. +.IP +If this file is set to the value 1, +then the kernel normally panics when out-of-memory happens. +However, if a process limits allocations to certain nodes +using memory policies +.RB ( mbind (2) +.BR MPOL_BIND ) +or cpusets +.RB ( cpuset (7)) +and those nodes reach memory exhaustion status, +one process may be killed by the OOM-killer. +No panic occurs in this case: +because other nodes' memory may be free, +this means the system as a whole may not have reached +an out-of-memory situation yet. +.IP +If this file is set to the value 2, +the kernel always panics when an out-of-memory condition occurs. +.IP +The default value is 0. +1 and 2 are for failover of clustering. +Select either according to your policy of failover. +.TP +.I /proc/sys/vm/swappiness +.\" The following is from Documentation/sysctl/vm.txt +The value in this file controls how aggressively the kernel will swap +memory pages. +Higher values increase aggressiveness, lower values +decrease aggressiveness. +The default value is 60. +.TP +.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)" +.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd +Specifies an amount of memory (in KiB) to reserve for user processes. +This is intended to prevent a user from starting a single memory hogging +process, such that they cannot recover (kill the hog). +The value in this file has an effect only when +.I /proc/sys/vm/overcommit_memory +is set to 2 ("overcommit never" mode). +In this case, the system reserves an amount of memory that is the minimum +of [3% of current process size, +.IR user_reserve_kbytes ]. +.IP +The default value in this file is the minimum of [3% of free pages, 128MiB] +expressed as KiB. +.IP +If the value in this file is set to zero, +then a user will be allowed to allocate all free memory with a single process +(minus the amount reserved by +.IR /proc/sys/vm/admin_reserve_kbytes ). +Any subsequent attempts to execute a command will result in +"fork: Cannot allocate memory". +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)" +.\" cefdca0a86be517bc390fc4541e3674b8e7803b0 +This (writable) file exposes a flag that controls whether +unprivileged processes are allowed to employ +.BR userfaultfd (2). +If this file has the value 1, then unprivileged processes may use +.BR userfaultfd (2). +If this file has the value 0, then only processes that have the +.B CAP_SYS_PTRACE +capability may employ +.BR userfaultfd (2). +The default value in this file is 1. +.SH SEE ALSO +.BR proc (5) |
