aboutsummaryrefslogtreecommitdiffstats
path: root/man5/proc_sys.5
blob: 78f0c192c25d13863243d25070caf258baf60288 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
'\" t
.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com>
.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com>
.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl>
.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org>
.\"
.\" SPDX-License-Identifier: GPL-3.0-or-later
.\"
.TH proc_sys 5 (date) "Linux man-pages (unreleased)"
.SH NAME
/proc/sys/ \- system information, and sysctl pseudo-filesystem
.SH DESCRIPTION
.TP
.I /proc/sys/
This directory (present since Linux 1.3.57) contains a number of files
and subdirectories corresponding to kernel variables.
These variables can be read and in some cases modified using
the \fI/proc\fP filesystem, and the (deprecated)
.BR sysctl (2)
system call.
.IP
String values may be terminated by either \[aq]\e0\[aq] or \[aq]\en\[aq].
.IP
Integer and long values may be written either in decimal or in
hexadecimal notation (e.g., 0x3FFF).
When writing multiple integer or long values, these may be separated
by any of the following whitespace characters:
\[aq]\ \[aq], \[aq]\et\[aq], or \[aq]\en\[aq].
Using other separators leads to the error
.BR EINVAL .
.TP
.IR /proc/sys/abi/ " (since Linux 2.4.10)"
This directory may contain files with application binary information.
.\" On some systems, it is not present.
See the Linux kernel source file
.I Documentation/sysctl/abi.rst
(or
.I Documentation/sysctl/abi.txt
before Linux 5.3)
for more information.
.TP
.I /proc/sys/debug/
This directory may be empty.
.TP
.I /proc/sys/dev/
This directory contains device-specific information (e.g.,
.IR dev/cdrom/info ).
On
some systems, it may be empty.
.TP
.I /proc/sys/fs/
This directory contains the files and subdirectories for kernel variables
related to filesystems.
.TP
.IR /proc/sys/fs/aio\-max\-nr " and " /proc/sys/fs/aio\-nr " (since Linux 2.6.4)"
.I aio\-nr
is the running total of the number of events specified by
.BR io_setup (2)
calls for all currently active AIO contexts.
If
.I aio\-nr
reaches
.IR aio\-max\-nr ,
then
.BR io_setup (2)
will fail with the error
.BR EAGAIN .
Raising
.I aio\-max\-nr
does not result in the preallocation or resizing
of any kernel data structures.
.TP
.I /proc/sys/fs/binfmt_misc
Documentation for files in this directory can be found
in the Linux kernel source in the file
.I Documentation/admin\-guide/binfmt\-misc.rst
(or in
.I Documentation/binfmt_misc.txt
on older kernels).
.TP
.IR /proc/sys/fs/dentry\-state " (since Linux 2.2)"
This file contains information about the status of the
directory cache (dcache).
The file contains six numbers,
.IR nr_dentry ,
.IR nr_unused ,
.I age_limit
(age in seconds),
.I want_pages
(pages requested by system) and two dummy values.
.RS
.IP \[bu] 3
.I nr_dentry
is the number of allocated dentries (dcache entries).
This field is unused in Linux 2.2.
.IP \[bu]
.I nr_unused
is the number of unused dentries.
.IP \[bu]
.I age_limit
.\" looks like this is unused in Linux 2.2 to Linux 2.6
is the age in seconds after which dcache entries
can be reclaimed when memory is short.
.IP \[bu]
.I want_pages
.\" looks like this is unused in Linux 2.2 to Linux 2.6
is nonzero when the kernel has called shrink_dcache_pages() and the
dcache isn't pruned yet.
.RE
.TP
.I /proc/sys/fs/dir\-notify\-enable
This file can be used to disable or enable the
.I dnotify
interface described in
.BR fcntl (2)
on a system-wide basis.
A value of 0 in this file disables the interface,
and a value of 1 enables it.
.TP
.I /proc/sys/fs/dquot\-max
This file shows the maximum number of cached disk quota entries.
On some (2.4) systems, it is not present.
If the number of free cached disk quota entries is very low and
you have some awesome number of simultaneous system users,
you might want to raise the limit.
.TP
.I /proc/sys/fs/dquot\-nr
This file shows the number of allocated disk quota
entries and the number of free disk quota entries.
.TP
.IR /proc/sys/fs/epoll/ " (since Linux 2.6.28)"
This directory contains the file
.IR max_user_watches ,
which can be used to limit the amount of kernel memory consumed by the
.I epoll
interface.
For further details, see
.BR epoll (7).
.TP
.I /proc/sys/fs/file\-max
This file defines
a system-wide limit on the number of open files for all processes.
System calls that fail when encountering this limit fail with the error
.BR ENFILE .
(See also
.BR setrlimit (2),
which can be used by a process to set the per-process limit,
.BR RLIMIT_NOFILE ,
on the number of files it may open.)
If you get lots
of error messages in the kernel log about running out of file handles
(open file descriptions)
(look for "VFS: file\-max limit <number> reached"),
try increasing this value:
.IP
.in +4n
.EX
echo 100000 > /proc/sys/fs/file\-max
.EE
.in
.IP
Privileged processes
.RB ( CAP_SYS_ADMIN )
can override the
.I file\-max
limit.
.TP
.I /proc/sys/fs/file\-nr
This (read-only) file contains three numbers:
the number of allocated file handles
(i.e., the number of open file descriptions; see
.BR open (2));
the number of free file handles;
and the maximum number of file handles (i.e., the same value as
.IR /proc/sys/fs/file\-max ).
If the number of allocated file handles is close to the
maximum, you should consider increasing the maximum.
Before Linux 2.6,
the kernel allocated file handles dynamically,
but it didn't free them again.
Instead the free file handles were kept in a list for reallocation;
the "free file handles" value indicates the size of that list.
A large number of free file handles indicates that there was
a past peak in the usage of open file handles.
Since Linux 2.6, the kernel does deallocate freed file handles,
and the "free file handles" value is always zero.
.TP
.IR /proc/sys/fs/inode\-max " (only present until Linux 2.2)"
This file contains the maximum number of in-memory inodes.
This value should be 3\[en]4 times larger
than the value in
.IR file\-max ,
since \fIstdin\fP, \fIstdout\fP
and network sockets also need an inode to handle them.
When you regularly run out of inodes, you need to increase this value.
.IP
Starting with Linux 2.4,
there is no longer a static limit on the number of inodes,
and this file is removed.
.TP
.I /proc/sys/fs/inode\-nr
This file contains the first two values from
.IR inode\-state .
.TP
.I /proc/sys/fs/inode\-state
This file
contains seven numbers:
.IR nr_inodes ,
.IR nr_free_inodes ,
.IR preshrink ,
and four dummy values (always zero).
.IP
.I nr_inodes
is the number of inodes the system has allocated.
.\" This can be slightly more than
.\" .I inode\-max
.\" because Linux allocates them one page full at a time.
.I nr_free_inodes
represents the number of free inodes.
.IP
.I preshrink
is nonzero when the
.I nr_inodes
>
.I inode\-max
and the system needs to prune the inode list instead of allocating more;
since Linux 2.4, this field is a dummy value (always zero).
.TP
.IR /proc/sys/fs/inotify/ " (since Linux 2.6.13)"
This directory contains files
.IR max_queued_events ", " max_user_instances ", and " max_user_watches ,
that can be used to limit the amount of kernel memory consumed by the
.I inotify
interface.
For further details, see
.BR inotify (7).
.TP
.I /proc/sys/fs/lease\-break\-time
This file specifies the grace period that the kernel grants to a process
holding a file lease
.RB ( fcntl (2))
after it has sent a signal to that process notifying it
that another process is waiting to open the file.
If the lease holder does not remove or downgrade the lease within
this grace period, the kernel forcibly breaks the lease.
.TP
.I /proc/sys/fs/leases\-enable
This file can be used to enable or disable file leases
.RB ( fcntl (2))
on a system-wide basis.
If this file contains the value 0, leases are disabled.
A nonzero value enables leases.
.TP
.IR /proc/sys/fs/mount\-max " (since Linux 4.9)"
.\" commit d29216842a85c7970c536108e093963f02714498
The value in this file specifies the maximum number of mounts that may exist
in a mount namespace.
The default value in this file is 100,000.
.TP
.IR /proc/sys/fs/mqueue/ " (since Linux 2.6.6)"
This directory contains files
.IR msg_max ", " msgsize_max ", and " queues_max ,
controlling the resources used by POSIX message queues.
See
.BR mq_overview (7)
for details.
.TP
.IR /proc/sys/fs/nr_open " (since Linux 2.6.25)"
.\" commit 9cfe015aa424b3c003baba3841a60dd9b5ad319b
This file imposes a ceiling on the value to which the
.B RLIMIT_NOFILE
resource limit can be raised (see
.BR getrlimit (2)).
This ceiling is enforced for both unprivileged and privileged process.
The default value in this file is 1048576.
(Before Linux 2.6.25, the ceiling for
.B RLIMIT_NOFILE
was hard-coded to the same value.)
.TP
.IR /proc/sys/fs/overflowgid " and " /proc/sys/fs/overflowuid
These files
allow you to change the value of the fixed UID and GID.
The default is 65534.
Some filesystems support only 16-bit UIDs and GIDs, although in Linux
UIDs and GIDs are 32 bits.
When one of these filesystems is mounted
with writes enabled, any UID or GID that would exceed 65535 is translated
to the overflow value before being written to disk.
.TP
.IR /proc/sys/fs/pipe\-max\-size " (since Linux 2.6.35)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/pipe\-user\-pages\-hard " (since Linux 4.5)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/pipe\-user\-pages\-soft " (since Linux 4.5)"
See
.BR pipe (7).
.TP
.IR /proc/sys/fs/protected_fifos " (since Linux 4.19)"
The value in this file is/can be set to one of the following:
.RS
.TP 4
0
Writing to FIFOs is unrestricted.
.TP
1
Don't allow
.B O_CREAT
.BR open (2)
on FIFOs that the caller doesn't own in world-writable sticky directories,
unless the FIFO is owned by the owner of the directory.
.TP
2
As for the value 1,
but the restriction also applies to group-writable sticky directories.
.RE
.IP
The intent of the above protections is to avoid unintentional writes to an
attacker-controlled FIFO when a program expected to create a regular file.
.TP
.IR /proc/sys/fs/protected_hardlinks " (since Linux 3.6)"
.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7
When the value in this file is 0,
no restrictions are placed on the creation of hard links
(i.e., this is the historical behavior before Linux 3.6).
When the value in this file is 1,
a hard link can be created to a target file
only if one of the following conditions is true:
.RS
.IP \[bu] 3
The calling process has the
.B CAP_FOWNER
capability in its user namespace
and the file UID has a mapping in the namespace.
.IP \[bu]
The filesystem UID of the process creating the link matches
the owner (UID) of the target file
(as described in
.BR credentials (7),
a process's filesystem UID is normally the same as its effective UID).
.IP \[bu]
All of the following conditions are true:
.RS 4
.IP \[bu] 3
the target is a regular file;
.IP \[bu]
the target file does not have its set-user-ID mode bit enabled;
.IP \[bu]
the target file does not have both its set-group-ID and
group-executable mode bits enabled; and
.IP \[bu]
the caller has permission to read and write the target file
(either via the file's permissions mask or because it has
suitable capabilities).
.RE
.RE
.IP
The default value in this file is 0.
Setting the value to 1
prevents a longstanding class of security issues caused by
hard-link-based time-of-check, time-of-use races,
most commonly seen in world-writable directories such as
.IR /tmp .
The common method of exploiting this flaw
is to cross privilege boundaries when following a given hard link
(i.e., a root process follows a hard link created by another user).
Additionally, on systems without separated partitions,
this stops unauthorized users from "pinning" vulnerable set-user-ID and
set-group-ID files against being upgraded by
the administrator, or linking to special files.
.TP
.IR /proc/sys/fs/protected_regular " (since Linux 4.19)"
The value in this file is/can be set to one of the following:
.RS
.TP 4
0
Writing to regular files is unrestricted.
.TP
1
Don't allow
.B O_CREAT
.BR open (2)
on regular files that the caller doesn't own in
world-writable sticky directories,
unless the regular file is owned by the owner of the directory.
.TP
2
As for the value 1,
but the restriction also applies to group-writable sticky directories.
.RE
.IP
The intent of the above protections is similar to
.IR protected_fifos ,
but allows an application to
avoid writes to an attacker-controlled regular file,
where the application expected to create one.
.TP
.IR /proc/sys/fs/protected_symlinks " (since Linux 3.6)"
.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7
When the value in this file is 0,
no restrictions are placed on following symbolic links
(i.e., this is the historical behavior before Linux 3.6).
When the value in this file is 1, symbolic links are followed only
in the following circumstances:
.RS
.IP \[bu] 3
the filesystem UID of the process following the link matches
the owner (UID) of the symbolic link
(as described in
.BR credentials (7),
a process's filesystem UID is normally the same as its effective UID);
.IP \[bu]
the link is not in a sticky world-writable directory; or
.IP \[bu]
the symbolic link and its parent directory have the same owner (UID)
.RE
.IP
A system call that fails to follow a symbolic link
because of the above restrictions returns the error
.B EACCES
in
.IR errno .
.IP
The default value in this file is 0.
Setting the value to 1 avoids a longstanding class of security issues
based on time-of-check, time-of-use races when accessing symbolic links.
.TP
.IR /proc/sys/fs/suid_dumpable " (since Linux 2.6.13)"
.\" The following is based on text from Documentation/sysctl/kernel.txt
The value in this file is assigned to a process's "dumpable" flag
in the circumstances described in
.BR prctl (2).
In effect,
the value in this file determines whether core dump files are
produced for set-user-ID or otherwise protected/tainted binaries.
The "dumpable" setting also affects the ownership of files in a process's
.IR /proc/ pid
directory, as described above.
.IP
Three different integer values can be specified:
.RS
.TP
\fI0\ (default)\fP
.\" In kernel source: SUID_DUMP_DISABLE
This provides the traditional (pre-Linux 2.6.13) behavior.
A core dump will not be produced for a process which has
changed credentials (by calling
.BR seteuid (2),
.BR setgid (2),
or similar, or by executing a set-user-ID or set-group-ID program)
or whose binary does not have read permission enabled.
.TP
\fI1\ ("debug")\fP
.\" In kernel source: SUID_DUMP_USER
All processes dump core when possible.
(Reasons why a process might nevertheless not dump core are described in
.BR core (5).)
The core dump is owned by the filesystem user ID of the dumping process
and no security is applied.
This is intended for system debugging situations only:
this mode is insecure because it allows unprivileged users to
examine the memory contents of privileged processes.
.TP
\fI2\ ("suidsafe")\fP
.\" In kernel source: SUID_DUMP_ROOT
Any binary which normally would not be dumped (see "0" above)
is dumped readable by root only.
This allows the user to remove the core dump file but not to read it.
For security reasons core dumps in this mode will not overwrite one
another or other files.
This mode is appropriate when administrators are
attempting to debug problems in a normal environment.
.IP
Additionally, since Linux 3.6,
.\" 9520628e8ceb69fa9a4aee6b57f22675d9e1b709
.I /proc/sys/kernel/core_pattern
must either be an absolute pathname
or a pipe command, as detailed in
.BR core (5).
Warnings will be written to the kernel log if
.I core_pattern
does not follow these rules, and no core dump will be produced.
.\" 54b501992dd2a839e94e76aa392c392b55080ce8
.RE
.IP
For details of the effect of a process's "dumpable" setting
on ptrace access mode checking, see
.BR ptrace (2).
.TP
.I /proc/sys/fs/super\-max
This file
controls the maximum number of superblocks, and
thus the maximum number of mounted filesystems the kernel
can have.
You need increase only
.I super\-max
if you need to mount more filesystems than the current value in
.I super\-max
allows you to.
.TP
.I /proc/sys/fs/super\-nr
This file
contains the number of filesystems currently mounted.
.TP
.I /proc/sys/kernel/
This directory contains files controlling a range of kernel parameters,
as described below.
.TP
.I /proc/sys/kernel/acct
This file
contains three numbers:
.IR highwater ,
.IR lowwater ,
and
.IR frequency .
If BSD-style process accounting is enabled, these values control
its behavior.
If free space on filesystem where the log lives goes below
.I lowwater
percent, accounting suspends.
If free space gets above
.I highwater
percent, accounting resumes.
.I frequency
determines
how often the kernel checks the amount of free space (value is in
seconds).
Default values are 4, 2, and 30.
That is, suspend accounting if 2% or less space is free; resume it
if 4% or more space is free; consider information about amount of free space
valid for 30 seconds.
.TP
.IR /proc/sys/kernel/auto_msgmni " (Linux 2.6.27 to Linux 3.18)"
.\" commit 9eefe520c814f6f62c5d36a2ddcd3fb99dfdb30e (introduces feature)
.\" commit 0050ee059f7fc86b1df2527aaa14ed5dc72f9973 (rendered redundant)
From Linux 2.6.27 to Linux 3.18,
this file was used to control recomputing of the value in
.I /proc/sys/kernel/msgmni
upon the addition or removal of memory or upon IPC namespace creation/removal.
Echoing "1" into this file enabled
.I msgmni
automatic recomputing (and triggered a recomputation of
.I msgmni
based on the current amount of available memory and number of IPC namespaces).
Echoing "0" disabled automatic recomputing.
(Automatic recomputing was also disabled if a value was explicitly assigned to
.IR /proc/sys/kernel/msgmni .)
The default value in
.I auto_msgmni
was 1.
.IP
Since Linux 3.19, the content of this file has no effect (because
.I msgmni
.\" FIXME Must document the 3.19 'msgmni' changes.
defaults to near the maximum value possible),
and reads from this file always return the value "0".
.TP
.IR /proc/sys/kernel/cap_last_cap " (since Linux 3.2)"
See
.BR capabilities (7).
.TP
.IR /proc/sys/kernel/cap\-bound " (from Linux 2.2 to Linux 2.6.24)"
This file holds the value of the kernel
.I "capability bounding set"
(expressed as a signed decimal number).
This set is ANDed against the capabilities permitted to a process
during
.BR execve (2).
Starting with Linux 2.6.25,
the system-wide capability bounding set disappeared,
and was replaced by a per-thread bounding set; see
.BR capabilities (7).
.TP
.I /proc/sys/kernel/core_pattern
See
.BR core (5).
.TP
.I /proc/sys/kernel/core_pipe_limit
See
.BR core (5).
.TP
.I /proc/sys/kernel/core_uses_pid
See
.BR core (5).
.TP
.I /proc/sys/kernel/ctrl\-alt\-del
This file
controls the handling of Ctrl-Alt-Del from the keyboard.
When the value in this file is 0, Ctrl-Alt-Del is trapped and
sent to the
.BR init (1)
program to handle a graceful restart.
When the value is greater than zero, Linux's reaction to a Vulcan
Nerve Pinch (tm) will be an immediate reboot, without even
syncing its dirty buffers.
Note: when a program (like dosemu) has the keyboard in "raw"
mode, the Ctrl-Alt-Del is intercepted by the program before it
ever reaches the kernel tty layer, and it's up to the program
to decide what to do with it.
.TP
.IR /proc/sys/kernel/dmesg_restrict " (since Linux 2.6.37)"
The value in this file determines who can see kernel syslog contents.
A value of 0 in this file imposes no restrictions.
If the value is 1, only privileged users can read the kernel syslog.
(See
.BR syslog (2)
for more details.)
Since Linux 3.4,
.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8
only users with the
.B CAP_SYS_ADMIN
capability may change the value in this file.
.TP
.IR /proc/sys/kernel/domainname " and " /proc/sys/kernel/hostname
can be used to set the NIS/YP domainname and the
hostname of your box in exactly the same way as the commands
.BR domainname (1)
and
.BR hostname (1),
that is:
.IP
.in +4n
.EX
.RB "#" " echo \[aq]darkstar\[aq] > /proc/sys/kernel/hostname"
.RB "#" " echo \[aq]mydomain\[aq] > /proc/sys/kernel/domainname"
.EE
.in
.IP
has the same effect as
.IP
.in +4n
.EX
.RB "#" " hostname \[aq]darkstar\[aq]"
.RB "#" " domainname \[aq]mydomain\[aq]"
.EE
.in
.IP
Note, however, that the classic darkstar.frop.org has the
hostname "darkstar" and DNS (Internet Domain Name Server)
domainname "frop.org", not to be confused with the NIS (Network
Information Service) or YP (Yellow Pages) domainname.
These two
domain names are in general different.
For a detailed discussion
see the
.BR hostname (1)
man page.
.TP
.I /proc/sys/kernel/hotplug
This file
contains the pathname for the hotplug policy agent.
The default value in this file is
.IR /sbin/hotplug .
.TP
.\" Removed in commit 87f504e5c78b910b0c1d6ffb89bc95e492322c84 (tglx/history.git)
.IR /proc/sys/kernel/htab\-reclaim " (before Linux 2.4.9.2)"
(PowerPC only) If this file is set to a nonzero value,
the PowerPC htab
.\" removed in commit 1b483a6a7b2998e9c98ad985d7494b9b725bd228, before Linux 2.6.28
(see kernel file
.IR Documentation/powerpc/ppc_htab.txt )
is pruned
each time the system hits the idle loop.
.TP
.I /proc/sys/kernel/keys/
This directory contains various files that define parameters and limits
for the key-management facility.
These files are described in
.BR keyrings (7).
.TP
.IR /proc/sys/kernel/kptr_restrict " (since Linux 2.6.38)"
.\" 455cd5ab305c90ffc422dd2e0fb634730942b257
The value in this file determines whether kernel addresses are exposed via
.I /proc
files and other interfaces.
A value of 0 in this file imposes no restrictions.
If the value is 1, kernel pointers printed using the
.I %pK
format specifier will be replaced with zeros unless the user has the
.B CAP_SYSLOG
capability.
If the value is 2, kernel pointers printed using the
.I %pK
format specifier will be replaced with zeros regardless
of the user's capabilities.
The initial default value for this file was 1,
but the default was changed
.\" commit 411f05f123cbd7f8aa1edcae86970755a6e2a9d9
to 0 in Linux 2.6.39.
Since Linux 3.4,
.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8
only users with the
.B CAP_SYS_ADMIN
capability can change the value in this file.
.TP
.I /proc/sys/kernel/l2cr
(PowerPC only) This file
contains a flag that controls the L2 cache of G3 processor
boards.
If 0, the cache is disabled.
Enabled if nonzero.
.TP
.I /proc/sys/kernel/modprobe
This file contains the pathname for the kernel module loader.
The default value is
.IR /sbin/modprobe .
The file is present only if the kernel is built with the
.B CONFIG_MODULES
.RB ( CONFIG_KMOD
in Linux 2.6.26 and earlier)
option enabled.
It is described by the Linux kernel source file
.I Documentation/kmod.txt
(present only in Linux 2.4 and earlier).
.TP
.IR /proc/sys/kernel/modules_disabled " (since Linux 2.6.31)"
.\" 3d43321b7015387cfebbe26436d0e9d299162ea1
.\" From Documentation/sysctl/kernel.txt
A toggle value indicating if modules are allowed to be loaded
in an otherwise modular kernel.
This toggle defaults to off (0), but can be set true (1).
Once true, modules can be neither loaded nor unloaded,
and the toggle cannot be set back to false.
The file is present only if the kernel is built with the
.B CONFIG_MODULES
option enabled.
.TP
.IR /proc/sys/kernel/msgmax " (since Linux 2.2)"
This file defines
a system-wide limit specifying the maximum number of bytes in
a single message written on a System V message queue.
.TP
.IR /proc/sys/kernel/msgmni " (since Linux 2.4)"
This file defines the system-wide limit on the number of
message queue identifiers.
See also
.IR /proc/sys/kernel/auto_msgmni .
.TP
.IR /proc/sys/kernel/msgmnb " (since Linux 2.2)"
This file defines a system-wide parameter used to initialize the
.I msg_qbytes
setting for subsequently created message queues.
The
.I msg_qbytes
setting specifies the maximum number of bytes that may be written to the
message queue.
.TP
.IR /proc/sys/kernel/ngroups_max " (since Linux 2.6.4)"
This is a read-only file that displays the upper limit on the
number of a process's group memberships.
.TP
.IR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)"
See
.BR pid_namespaces (7).
.TP
.IR /proc/sys/kernel/ostype " and " /proc/sys/kernel/osrelease
These files
give substrings of
.IR /proc/version .
.TP
.IR /proc/sys/kernel/overflowgid " and " /proc/sys/kernel/overflowuid
These files duplicate the files
.I /proc/sys/fs/overflowgid
and
.IR /proc/sys/fs/overflowuid .
.TP
.I /proc/sys/kernel/panic
This file gives read/write access to the kernel variable
.IR panic_timeout .
If this is zero, the kernel will loop on a panic; if nonzero,
it indicates that the kernel should autoreboot after this number
of seconds.
When you use the
software watchdog device driver, the recommended setting is 60.
.TP
.IR /proc/sys/kernel/panic_on_oops " (since Linux 2.5.68)"
This file controls the kernel's behavior when an oops
or BUG is encountered.
If this file contains 0, then the system
tries to continue operation.
If it contains 1, then the system
delays a few seconds (to give klogd time to record the oops output)
and then panics.
If the
.I /proc/sys/kernel/panic
file is also nonzero, then the machine will be rebooted.
.TP
.IR /proc/sys/kernel/pid_max " (since Linux 2.5.34)"
This file specifies the value at which PIDs wrap around
(i.e., the value in this file is one greater than the maximum PID).
PIDs greater than this value are not allocated;
thus, the value in this file also acts as a system-wide limit
on the total number of processes and threads.
The default value for this file, 32768,
results in the same range of PIDs as on earlier kernels.
On 32-bit platforms, 32768 is the maximum value for
.IR pid_max .
On 64-bit systems,
.I pid_max
can be set to any value up to 2\[ha]22
.RB ( PID_MAX_LIMIT ,
approximately 4 million).
.\" Prior to Linux 2.6.10, pid_max could also be raised above 32768 on 32-bit
.\" platforms, but this broke /proc/[pid]
.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=109513010926152&w=2
.TP
.IR /proc/sys/kernel/powersave\-nap " (PowerPC only)"
This file contains a flag.
If set, Linux-PPC will use the "nap" mode of
powersaving,
otherwise the "doze" mode will be used.
.TP
.I /proc/sys/kernel/printk
See
.BR syslog (2).
.TP
.IR /proc/sys/kernel/pty " (since Linux 2.6.4)"
This directory contains two files relating to the number of UNIX 98
pseudoterminals (see
.BR pts (4))
on the system.
.TP
.I /proc/sys/kernel/pty/max
This file defines the maximum number of pseudoterminals.
.\" FIXME Document /proc/sys/kernel/pty/reserve
.\"     New in Linux 3.3
.\"     commit e9aba5158a80098447ff207a452a3418ae7ee386
.TP
.I /proc/sys/kernel/pty/nr
This read-only file
indicates how many pseudoterminals are currently in use.
.TP
.I /proc/sys/kernel/random/
This directory
contains various parameters controlling the operation of the file
.IR /dev/random .
See
.BR random (4)
for further information.
.TP
.IR /proc/sys/kernel/random/uuid " (since Linux 2.4)"
Each read from this read-only file returns a randomly generated 128-bit UUID,
as a string in the standard UUID format.
.TP
.IR /proc/sys/kernel/randomize_va_space " (since Linux 2.6.12)"
.\" Some further details can be found in Documentation/sysctl/kernel.txt
Select the address space layout randomization (ASLR) policy for the system
(on architectures that support ASLR).
Three values are supported for this file:
.RS
.TP
.B 0
Turn ASLR off.
This is the default for architectures that don't support ASLR,
and when the kernel is booted with the
.I norandmaps
parameter.
.TP
.B 1
Make the addresses of
.BR mmap (2)
allocations, the stack, and the VDSO page randomized.
Among other things, this means that shared libraries will be
loaded at randomized addresses.
The text segment of PIE-linked binaries will also be loaded
at a randomized address.
This value is the default if the kernel was configured with
.BR CONFIG_COMPAT_BRK .
.TP
.B 2
(Since Linux 2.6.25)
.\" commit c1d171a002942ea2d93b4fbd0c9583c56fce0772
Also support heap randomization.
This value is the default if the kernel was not configured with
.BR CONFIG_COMPAT_BRK .
.RE
.TP
.I /proc/sys/kernel/real\-root\-dev
This file is documented in the Linux kernel source file
.I Documentation/admin\-guide/initrd.rst
.\" commit 9d85025b0418163fae079c9ba8f8445212de8568
(or
.I Documentation/initrd.txt
before Linux 4.10).
.TP
.IR /proc/sys/kernel/reboot\-cmd " (Sparc only)"
This file seems to be a way to give an argument to the SPARC
ROM/Flash boot loader.
Maybe to tell it what to do after
rebooting?
.TP
.I /proc/sys/kernel/rtsig\-max
(Up to and including Linux 2.6.7; see
.BR setrlimit (2))
This file can be used to tune the maximum number
of POSIX real-time (queued) signals that can be outstanding
in the system.
.TP
.I /proc/sys/kernel/rtsig\-nr
(Up to and including Linux 2.6.7.)
This file shows the number of POSIX real-time signals currently queued.
.TP
.IR /proc/ pid /sched_autogroup_enabled " (since Linux 2.6.38)"
.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a
See
.BR sched (7).
.TP
.IR /proc/sys/kernel/sched_child_runs_first " (since Linux 2.6.23)"
If this file contains the value zero, then, after a
.BR fork (2),
the parent is first scheduled on the CPU.
If the file contains a nonzero value,
then the child is scheduled first on the CPU.
(Of course, on a multiprocessor system,
the parent and the child might both immediately be scheduled on a CPU.)
.TP
.IR /proc/sys/kernel/sched_rr_timeslice_ms " (since Linux 3.9)"
See
.BR sched_rr_get_interval (2).
.TP
.IR /proc/sys/kernel/sched_rt_period_us " (since Linux 2.6.25)"
See
.BR sched (7).
.TP
.IR /proc/sys/kernel/sched_rt_runtime_us " (since Linux 2.6.25)"
See
.BR sched (7).
.TP
.IR /proc/sys/kernel/seccomp/ " (since Linux 4.14)"
.\" commit 8e5f1ad116df6b0de65eac458d5e7c318d1c05af
This directory provides additional seccomp information and
configuration.
See
.BR seccomp (2)
for further details.
.TP
.IR /proc/sys/kernel/sem " (since Linux 2.4)"
This file contains 4 numbers defining limits for System V IPC semaphores.
These fields are, in order:
.RS
.TP
SEMMSL
The maximum semaphores per semaphore set.
.TP
SEMMNS
A system-wide limit on the number of semaphores in all semaphore sets.
.TP
SEMOPM
The maximum number of operations that may be specified in a
.BR semop (2)
call.
.TP
SEMMNI
A system-wide limit on the maximum number of semaphore identifiers.
.RE
.TP
.I /proc/sys/kernel/sg\-big\-buff
This file
shows the size of the generic SCSI device (sg) buffer.
You can't tune it just yet, but you could change it at
compile time by editing
.I include/scsi/sg.h
and changing
the value of
.BR SG_BIG_BUFF .
However, there shouldn't be any reason to change this value.
.TP
.IR /proc/sys/kernel/shm_rmid_forced " (since Linux 3.1)"
.\" commit b34a6b1da371ed8af1221459a18c67970f7e3d53
.\" See also Documentation/sysctl/kernel.txt
If this file is set to 1, all System V shared memory segments will
be marked for destruction as soon as the number of attached processes
falls to zero;
in other words, it is no longer possible to create shared memory segments
that exist independently of any attached process.
.IP
The effect is as though a
.BR shmctl (2)
.B IPC_RMID
is performed on all existing segments as well as all segments
created in the future (until this file is reset to 0).
Note that existing segments that are attached to no process will be
immediately destroyed when this file is set to 1.
Setting this option will also destroy segments that were created,
but never attached,
upon termination of the process that created the segment with
.BR shmget (2).
.IP
Setting this file to 1 provides a way of ensuring that
all System V shared memory segments are counted against the
resource usage and resource limits (see the description of
.B RLIMIT_AS
in
.BR getrlimit (2))
of at least one process.
.IP
Because setting this file to 1 produces behavior that is nonstandard
and could also break existing applications,
the default value in this file is 0.
Set this file to 1 only if you have a good understanding
of the semantics of the applications using
System V shared memory on your system.
.TP
.IR /proc/sys/kernel/shmall " (since Linux 2.2)"
This file
contains the system-wide limit on the total number of pages of
System V shared memory.
.TP
.IR /proc/sys/kernel/shmmax " (since Linux 2.2)"
This file
can be used to query and set the run-time limit
on the maximum (System V IPC) shared memory segment size that can be
created.
Shared memory segments up to 1 GB are now supported in the
kernel.
This value defaults to
.BR SHMMAX .
.TP
.IR /proc/sys/kernel/shmmni " (since Linux 2.4)"
This file
specifies the system-wide maximum number of System V shared memory
segments that can be created.
.TP
.IR /proc/sys/kernel/sysctl_writes_strict " (since Linux 3.16)"
.\" commit f88083005ab319abba5d0b2e4e997558245493c8
.\" commit 2ca9bb456ada8bcbdc8f77f8fc78207653bbaa92
.\" commit f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61
.\" commit 24fe831c17ab8149413874f2fd4e5c8a41fcd294
The value in this file determines how the file offset affects
the behavior of updating entries in files under
.IR /proc/sys .
The file has three possible values:
.RS
.TP 4
\-1
This provides legacy handling, with no printk warnings.
Each
.BR write (2)
must fully contain the value to be written,
and multiple writes on the same file descriptor
will overwrite the entire value, regardless of the file position.
.TP
0
(default) This provides the same behavior as for \-1,
but printk warnings are written for processes that
perform writes when the file offset is not 0.
.TP
1
Respect the file offset when writing strings into
.I /proc/sys
files.
Multiple writes will
.I append
to the value buffer.
Anything written beyond the maximum length
of the value buffer will be ignored.
Writes to numeric
.I /proc/sys
entries must always be at file offset 0 and the value must be
fully contained in the buffer provided to
.BR write (2).
.\" FIXME .
.\"     With /proc/sys/kernel/sysctl_writes_strict==1, writes at an
.\"     offset other than 0 do not generate an error. Instead, the
.\"     write() succeeds, but the file is left unmodified.
.\"     This is surprising. The behavior may change in the future.
.\"     See thread.gmane.org/gmane.linux.man/9197
.\"		From: Michael Kerrisk (man-pages <mtk.manpages@...>
.\"		Subject: sysctl_writes_strict documentation + an oddity?
.\"		Newsgroups: gmane.linux.man, gmane.linux.kernel
.\"		Date: 2015-05-09 08:54:11 GMT
.RE
.TP
.I /proc/sys/kernel/sysrq
This file controls the functions allowed to be invoked by the SysRq key.
By default,
the file contains 1 meaning that every possible SysRq request is allowed
(in older kernel versions, SysRq was disabled by default,
and you were required to specifically enable it at run-time,
but this is not the case any more).
Possible values in this file are:
.RS
.TP 5
0
Disable sysrq completely
.TP
1
Enable all functions of sysrq
.TP
> 1
Bit mask of allowed sysrq functions, as follows:
.PD 0
.RS
.TP 5
\ \ 2
Enable control of console logging level
.TP
\ \ 4
Enable control of keyboard (SAK, unraw)
.TP
\ \ 8
Enable debugging dumps of processes etc.
.TP
\ 16
Enable sync command
.TP
\ 32
Enable remount read-only
.TP
\ 64
Enable signaling of processes (term, kill, oom-kill)
.TP
128
Allow reboot/poweroff
.TP
256
Allow nicing of all real-time tasks
.RE
.PD
.RE
.IP
This file is present only if the
.B CONFIG_MAGIC_SYSRQ
kernel configuration option is enabled.
For further details see the Linux kernel source file
.I Documentation/admin\-guide/sysrq.rst
.\" commit 9d85025b0418163fae079c9ba8f8445212de8568
(or
.I Documentation/sysrq.txt
before Linux 4.10).
.TP
.I /proc/sys/kernel/version
This file contains a string such as:
.IP
.in +4n
.EX
#5 Wed Feb 25 21:49:24 MET 1998
.EE
.in
.IP
The "#5" means that
this is the fifth kernel built from this source base and the
date following it indicates the time the kernel was built.
.TP
.IR /proc/sys/kernel/threads\-max " (since Linux 2.3.11)"
.\" The following is based on Documentation/sysctl/kernel.txt
This file specifies the system-wide limit on the number of
threads (tasks) that can be created on the system.
.IP
Since Linux 4.1,
.\" commit 230633d109e35b0a24277498e773edeb79b4a331
the value that can be written to
.I threads\-max
is bounded.
The minimum value that can be written is 20.
The maximum value that can be written is given by the
constant
.B FUTEX_TID_MASK
(0x3fffffff).
If a value outside of this range is written to
.IR threads\-max ,
the error
.B EINVAL
occurs.
.IP
The value written is checked against the available RAM pages.
If the thread structures would occupy too much (more than 1/8th)
of the available RAM pages,
.I threads\-max
is reduced accordingly.
.TP
.IR /proc/sys/kernel/yama/ptrace_scope " (since Linux 3.5)"
See
.BR ptrace (2).
.TP
.IR /proc/sys/kernel/zero\-paged " (PowerPC only)"
This file
contains a flag.
When enabled (nonzero), Linux-PPC will pre-zero pages in
the idle loop, possibly speeding up get_free_pages.
.TP
.I /proc/sys/net
This directory contains networking stuff.
Explanations for some of the files under this directory can be found in
.BR tcp (7)
and
.BR ip (7).
.TP
.I /proc/sys/net/core/bpf_jit_enable
See
.BR bpf (2).
.TP
.I /proc/sys/net/core/somaxconn
This file defines a ceiling value for the
.I backlog
argument of
.BR listen (2);
see the
.BR listen (2)
manual page for details.
.TP
.I /proc/sys/proc
This directory may be empty.
.TP
.I /proc/sys/sunrpc
This directory supports Sun remote procedure call for network filesystem
(NFS).
On some systems, it is not present.
.TP
.IR /proc/sys/user " (since Linux 4.9)"
See
.BR namespaces (7).
.TP
.I /proc/sys/vm/
This directory contains files for memory management tuning, buffer, and
cache management.
.TP
.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)"
.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009
This file defines the amount of free memory (in KiB) on the system that
should be reserved for users with the capability
.BR CAP_SYS_ADMIN .
.IP
The default value in this file is the minimum of [3% of free pages, 8MiB]
expressed as KiB.
The default is intended to provide enough for the superuser
to log in and kill a process, if necessary,
under the default overcommit 'guess' mode (i.e., 0 in
.IR /proc/sys/vm/overcommit_memory ).
.IP
Systems running in "overcommit never" mode (i.e., 2 in
.IR /proc/sys/vm/overcommit_memory )
should increase the value in this file to account
for the full virtual memory size of the programs used to recover (e.g.,
.BR login (1)
.BR ssh (1),
and
.BR top (1))
Otherwise, the superuser may not be able to log in to recover the system.
For example, on x86-64 a suitable value is 131072 (128MiB reserved).
.IP
Changing the value in this file takes effect whenever
an application requests memory.
.TP
.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)"
When 1 is written to this file, all zones are compacted such that free
memory is available in contiguous blocks where possible.
The effect of this action can be seen by examining
.IR /proc/buddyinfo .
.IP
Present only if the kernel was configured with
.BR CONFIG_COMPACTION .
.TP
.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)"
Writing to this file causes the kernel to drop clean caches, dentries, and
inodes from memory, causing that memory to become free.
This can be useful for memory management testing and
performing reproducible filesystem benchmarks.
Because writing to this file causes the benefits of caching to be lost,
it can degrade overall system performance.
.IP
To free pagecache, use:
.IP
.in +4n
.EX
echo 1 > /proc/sys/vm/drop_caches
.EE
.in
.IP
To free dentries and inodes, use:
.IP
.in +4n
.EX
echo 2 > /proc/sys/vm/drop_caches
.EE
.in
.IP
To free pagecache, dentries, and inodes, use:
.IP
.in +4n
.EX
echo 3 > /proc/sys/vm/drop_caches
.EE
.in
.IP
Because writing to this file is a nondestructive operation and dirty objects
are not freeable, the
user should run
.BR sync (1)
first.
.TP
.IR  /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)"
This writable file contains a group ID that is allowed
to allocate memory using huge pages.
If a process has a filesystem group ID or any supplementary group ID that
matches this group ID,
then it can make huge-page allocations without holding the
.B CAP_IPC_LOCK
capability; see
.BR memfd_create (2),
.BR mmap (2),
and
.BR shmget (2).
.TP
.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)"
.\" The following is from Documentation/filesystems/proc.txt
If nonzero, this disables the new 32-bit memory-mapping layout;
the kernel will use the legacy (2.4) layout for all processes.
.TP
.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)"
.\" The following is based on the text in Documentation/sysctl/vm.txt
Control how to kill processes when an uncorrected memory error
(typically a 2-bit error in a memory module)
that cannot be handled by the kernel
is detected in the background by hardware.
In some cases (like the page still having a valid copy on disk),
the kernel will handle the failure
transparently without affecting any applications.
But if there is no other up-to-date copy of the data,
it will kill processes to prevent any data corruptions from propagating.
.IP
The file has one of the following values:
.RS
.TP
.B 1
Kill all processes that have the corrupted-and-not-reloadable page mapped
as soon as the corruption is detected.
Note that this is not supported for a few types of pages,
such as kernel internally
allocated data or the swap cache, but works for the majority of user pages.
.TP
.B 0
Unmap the corrupted page from all processes and kill a process
only if it tries to access the page.
.RE
.IP
The kill is performed using a
.B SIGBUS
signal with
.I si_code
set to
.BR BUS_MCEERR_AO .
Processes can handle this if they want to; see
.BR sigaction (2)
for more details.
.IP
This feature is active only on architectures/platforms with advanced machine
check handling and depends on the hardware capabilities.
.IP
Applications can override the
.I memory_failure_early_kill
setting individually with the
.BR prctl (2)
.B PR_MCE_KILL
operation.
.IP
Present only if the kernel was configured with
.BR CONFIG_MEMORY_FAILURE .
.TP
.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)"
.\" The following is based on the text in Documentation/sysctl/vm.txt
Enable memory failure recovery (when supported by the platform).
.RS
.TP
.B 1
Attempt recovery.
.TP
.B 0
Always panic on a memory failure.
.RE
.IP
Present only if the kernel was configured with
.BR CONFIG_MEMORY_FAILURE .
.TP
.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)"
.\" The following is from Documentation/sysctl/vm.txt
Enables a system-wide task dump (excluding kernel threads) to be
produced when the kernel performs an OOM-killing.
The dump includes the following information
for each task (thread, process):
thread ID, real user ID, thread group ID (process ID),
virtual memory size, resident set size,
the CPU that the task is scheduled on,
oom_adj score (see the description of
.IR /proc/ pid /oom_adj ),
and command name.
This is helpful to determine why the OOM-killer was invoked
and to identify the rogue task that caused it.
.IP
If this contains the value zero, this information is suppressed.
On very large systems with thousands of tasks,
it may not be feasible to dump the memory state information for each one.
Such systems should not be forced to incur a performance penalty in
OOM situations when the information may not be desired.
.IP
If this is set to nonzero, this information is shown whenever the
OOM-killer actually kills a memory-hogging task.
.IP
The default value is 0.
.TP
.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)"
.\" The following is from Documentation/sysctl/vm.txt
This enables or disables killing the OOM-triggering task in
out-of-memory situations.
.IP
If this is set to zero, the OOM-killer will scan through the entire
tasklist and select a task based on heuristics to kill.
This normally selects a rogue memory-hogging task that
frees up a large amount of memory when killed.
.IP
If this is set to nonzero, the OOM-killer simply kills the task that
triggered the out-of-memory condition.
This avoids a possibly expensive tasklist scan.
.IP
If
.I /proc/sys/vm/panic_on_oom
is nonzero, it takes precedence over whatever value is used in
.IR /proc/sys/vm/oom_kill_allocating_task .
.IP
The default value is 0.
.TP
.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)"
.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7
This writable file provides an alternative to
.I /proc/sys/vm/overcommit_ratio
for controlling the
.I CommitLimit
when
.I /proc/sys/vm/overcommit_memory
has the value 2.
It allows the amount of memory overcommitting to be specified as
an absolute value (in kB),
rather than as a percentage, as is done with
.IR overcommit_ratio .
This allows for finer-grained control of
.I CommitLimit
on systems with extremely large memory sizes.
.IP
Only one of
.I overcommit_kbytes
or
.I overcommit_ratio
can have an effect:
if
.I overcommit_kbytes
has a nonzero value, then it is used to calculate
.IR CommitLimit ,
otherwise
.I overcommit_ratio
is used.
Writing a value to either of these files causes the
value in the other file to be set to zero.
.TP
.I /proc/sys/vm/overcommit_memory
This file contains the kernel virtual memory accounting mode.
Values are:
.RS
.IP
0: heuristic overcommit (this is the default)
.br
1: always overcommit, never check
.br
2: always check, never overcommit
.RE
.IP
In mode 0, calls of
.BR mmap (2)
with
.B MAP_NORESERVE
are not checked, and the default check is very weak,
leading to the risk of getting a process "OOM-killed".
.IP
In mode 1, the kernel pretends there is always enough memory,
until memory actually runs out.
One use case for this mode is scientific computing applications
that employ large sparse arrays.
Before Linux 2.6.0, any nonzero value implies mode 1.
.IP
In mode 2 (available since Linux 2.6), the total virtual address space
that can be allocated
.RI ( CommitLimit
in
.IR /proc/meminfo )
is calculated as
.IP
.in +4n
.EX
CommitLimit = (total_RAM \- total_huge_TLB) *
	      overcommit_ratio / 100 + total_swap
.EE
.in
.IP
where:
.RS
.IP \[bu] 3
.I total_RAM
is the total amount of RAM on the system;
.IP \[bu]
.I total_huge_TLB
is the amount of memory set aside for huge pages;
.IP \[bu]
.I overcommit_ratio
is the value in
.IR /proc/sys/vm/overcommit_ratio ;
and
.IP \[bu]
.I total_swap
is the amount of swap space.
.RE
.IP
For example, on a system with 16 GB of physical RAM, 16 GB
of swap, no space dedicated to huge pages, and an
.I overcommit_ratio
of 50, this formula yields a
.I CommitLimit
of 24 GB.
.IP
Since Linux 3.14, if the value in
.I /proc/sys/vm/overcommit_kbytes
is nonzero, then
.I CommitLimit
is instead calculated as:
.IP
.in +4n
.EX
CommitLimit = overcommit_kbytes + total_swap
.EE
.in
.IP
See also the description of
.I /proc/sys/vm/admin_reserve_kbytes
and
.IR /proc/sys/vm/user_reserve_kbytes .
.TP
.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)"
This writable file defines a percentage by which memory
can be overcommitted.
The default value in the file is 50.
See the description of
.IR /proc/sys/vm/overcommit_memory .
.TP
.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)"
.\" The following is adapted from Documentation/sysctl/vm.txt
This enables or disables a kernel panic in
an out-of-memory situation.
.IP
If this file is set to the value 0,
the kernel's OOM-killer will kill some rogue process.
Usually, the OOM-killer is able to kill a rogue process and the
system will survive.
.IP
If this file is set to the value 1,
then the kernel normally panics when out-of-memory happens.
However, if a process limits allocations to certain nodes
using memory policies
.RB ( mbind (2)
.BR MPOL_BIND )
or cpusets
.RB ( cpuset (7))
and those nodes reach memory exhaustion status,
one process may be killed by the OOM-killer.
No panic occurs in this case:
because other nodes' memory may be free,
this means the system as a whole may not have reached
an out-of-memory situation yet.
.IP
If this file is set to the value 2,
the kernel always panics when an out-of-memory condition occurs.
.IP
The default value is 0.
1 and 2 are for failover of clustering.
Select either according to your policy of failover.
.TP
.I /proc/sys/vm/swappiness
.\" The following is from Documentation/sysctl/vm.txt
The value in this file controls how aggressively the kernel will swap
memory pages.
Higher values increase aggressiveness, lower values
decrease aggressiveness.
The default value is 60.
.TP
.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)"
.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd
Specifies an amount of memory (in KiB) to reserve for user processes.
This is intended to prevent a user from starting a single memory hogging
process, such that they cannot recover (kill the hog).
The value in this file has an effect only when
.I /proc/sys/vm/overcommit_memory
is set to 2 ("overcommit never" mode).
In this case, the system reserves an amount of memory that is the minimum
of [3% of current process size,
.IR user_reserve_kbytes ].
.IP
The default value in this file is the minimum of [3% of free pages, 128MiB]
expressed as KiB.
.IP
If the value in this file is set to zero,
then a user will be allowed to allocate all free memory with a single process
(minus the amount reserved by
.IR /proc/sys/vm/admin_reserve_kbytes ).
Any subsequent attempts to execute a command will result in
"fork: Cannot allocate memory".
.IP
Changing the value in this file takes effect whenever
an application requests memory.
.TP
.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)"
.\" cefdca0a86be517bc390fc4541e3674b8e7803b0
This (writable) file exposes a flag that controls whether
unprivileged processes are allowed to employ
.BR userfaultfd (2).
If this file has the value 1, then unprivileged processes may use
.BR userfaultfd (2).
If this file has the value 0, then only processes that have the
.B CAP_SYS_PTRACE
capability may employ
.BR userfaultfd (2).
The default value in this file is 1.
.SH SEE ALSO
.BR proc (5)