aboutsummaryrefslogtreecommitdiffstats
path: root/man7/capabilities.7
diff options
context:
space:
mode:
authorMichael Kerrisk <mtk.manpages@gmail.com>2008-07-09 12:11:50 +0000
committerMichael Kerrisk <mtk.manpages@gmail.com>2008-07-09 12:11:50 +0000
commitc8e6851294eb18e15a11442460359751c7f29369 (patch)
tree873af1374b3d9a664259ae5ec2e1931787149cd6 /man7/capabilities.7
parent8ab8b43f0e5a9cfb8b9a6b63f4dc5459092a2e96 (diff)
downloadman-pages-c8e6851294eb18e15a11442460359751c7f29369.tar.gz
Reword discussion of CAP_LINUX_IMMUTABLE to be file-system neutral.
Diffstat (limited to 'man7/capabilities.7')
-rw-r--r--man7/capabilities.7726
1 files changed, 557 insertions, 169 deletions
diff --git a/man7/capabilities.7 b/man7/capabilities.7
index 1057a218ce..f22aa7c674 100644
--- a/man7/capabilities.7
+++ b/man7/capabilities.7
@@ -25,10 +25,24 @@
.\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com>
.\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER
.\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE
-.\" FIXME serge@hallyn.com promises updates to this page in loine with
-.\" recent changes to capabilities code in kernel, Feb 2008.
+.\" 2008-07-15, Serge Hallyn <serue@us.bbm.com>
+.\" Document file capabilities, per-process capability
+.\" bounding set, changed semantics for CAP_SETPCAP,
+.\" and other changes in 2.6.2[45].
+.\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP.
+.\" 2008-07-15, mtk
+.\" Add text describing circumstances in which CAP_SETPCAP
+.\" (theoretically) permits a thread to change the
+.\" capability sets of another thread.
+.\" Add section describing rules for programmatically
+.\" adjusting thread capability sets.
+.\" Describe rationale for capability bounding set.
+.\" Document "securebits" flags.
+.\" Add text noting that if we set the effective flag for one file
+.\" capability, then we must also set the effective flag for all
+.\" other capabilities where the permitted or inheritable bit is set.
.\"
-.TH CAPABILITIES 7 2008-07-09 "Linux" "Linux Programmer's Manual"
+.TH CAPABILITIES 7 2008-07-15 "Linux" "Linux Programmer's Manual"
.SH NAME
capabilities \- overview of Linux capabilities
.SH DESCRIPTION
@@ -49,58 +63,69 @@ associated with superuser into distinct units, known as
.IR capabilities ,
which can be independently enabled and disabled.
Capabilities are a per-thread attribute.
+.\"
.SS Capabilities List
-
-As at Linux 2.6.14, the following capabilities are implemented:
+The following list shows the capabilities implemented on Linux,
+and the operations or behaviors that each capability permits:
.TP
.BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)"
Enable and disable kernel auditing; change auditing filter rules;
retrieve auditing status and filtering rules.
.TP
.BR CAP_AUDIT_WRITE " (since Linux 2.6.11)"
-Allow records to be written to kernel auditing log.
+Write records to kernel auditing log.
.TP
.B CAP_CHOWN
-Allow arbitrary changes to file UIDs and GIDs (see
+Make arbitrary changes to file UIDs and GIDs (see
.BR chown (2)).
.TP
.B CAP_DAC_OVERRIDE
Bypass file read, write, and execute permission checks.
-(DAC = "discretionary access control".)
+(DAC is an abbreviation of "discretionary access control".)
.TP
.B CAP_DAC_READ_SEARCH
Bypass file read permission checks and
directory read and execute permission checks.
.TP
.B CAP_FOWNER
+.PD 0
+.RS
+.IP * 2
Bypass permission checks on operations that normally
require the file system UID of the process to match the UID of
the file (e.g.,
.BR chmod (2),
.BR utime (2)),
-excluding those operations covered by the
+excluding those operations covered by
.B CAP_DAC_OVERRIDE
and
.BR CAP_DAC_READ_SEARCH ;
+.IP *
set extended file attributes (see
.BR chattr (1))
on arbitrary files;
+.IP *
set Access Control Lists (ACLs) on arbitrary files;
+.IP *
ignore directory sticky bit on file deletion;
+.IP *
specify
.B O_NOATIME
for arbitrary files in
.BR open (2)
and
.BR fcntl (2).
+.RE
+.PD
.TP
.B CAP_FSETID
-Don't clear set-user-ID and set-group-ID bits when a file is modified;
-permit setting of the set-group-ID bit for a file whose GID does not match
+Don't clear set-user-ID and set-group-ID permission
+bits when a file is modified;
+set the set-group-ID bit for a file whose GID does not match
the file system or any of the supplementary GIDs of the calling process.
.TP
.B CAP_IPC_LOCK
-Permit memory locking
+Lock memory
.RB ( mlock (2),
.BR mlockall (2),
.BR mmap (2),
@@ -113,88 +138,123 @@ Bypass permission checks for operations on System V IPC objects.
Bypass permission checks for sending signals (see
.BR kill (2)).
This includes use of the
+.BR ioctl (2)
.B KDSIGACCEPT
-ioctl.
+operation.
.\" FIXME CAP_KILL also has an effect for threads + setting child
.\" termination signal to other than SIGCHLD: without this
.\" capability, the termination signal reverts to SIGCHLD
.\" if the child does an exec(). What is the rationale
.\" for this?
.TP
-.B CAP_LEASE
-(Linux 2.4 onwards) Allow file leases to be established on
-arbitrary files (see
+.BR CAP_LEASE " (since Linux 2.4)"
+Establish leases on arbitrary files (see
.BR fcntl (2)).
.TP
.B CAP_LINUX_IMMUTABLE
-Allow setting of the
-.B EXT2_APPEND_FL
+Set the
+.B FS_APPEND_FL
and
-.B EXT2_IMMUTABLE_FL
-.\" These attributes are now available on ext2, ext3, Reiserfs
-extended file attributes (see
+.B FS_IMMUTABLE_FL
+.\" These attributes are now available on ext2, ext3, Reiserfs, XFS, JFS
+i-node flags (see
.BR chattr (1)).
.TP
-.B CAP_MKNOD
-(Linux 2.4 onwards)
-Allow creation of special files using
+.BR CAP_MAC_ADMIN " (since Linux 2.6.25)"
+Override Mandatory Access Control (MAC).
+Implemented for the Smack Linux Security Module (LSM).
+.TP
+.BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)"
+Allow MAC configuration or state changes.
+Implemented for the Smack LSM.
+.TP
+.BR CAP_MKNOD " (since Linux 2.4)"
+Create special files using
.BR mknod (2).
.TP
.B CAP_NET_ADMIN
-Allow various network-related operations
+Perform various network-related operations
(e.g., setting privileged socket options,
enabling multicasting, interface configuration,
modifying routing tables).
.TP
.B CAP_NET_BIND_SERVICE
-Allow binding to Internet domain reserved socket ports
+Bind a socket to Internet domain reserved ports
(port numbers less than 1024).
.TP
.B CAP_NET_BROADCAST
-(Unused) Allow socket broadcasting, and listening multicasts.
+(Unused) Make socket broadcasts, and listen to multicasts.
.TP
.B CAP_NET_RAW
-Permit use of RAW and PACKET sockets.
+Use RAW and PACKET sockets.
.\" Also various IP options and setsockopt(SO_BINDTODEVICE)
.TP
.B CAP_SETGID
-Allow arbitrary manipulations of process GIDs and supplementary GID list;
-allow forged GID when passing socket credentials via Unix domain sockets.
+Make arbitrary manipulations of process GIDs and supplementary GID list;
+forge GID when passing socket credentials via Unix domain sockets.
.TP
+.BR CAP_SETFCAP " (since Linux 2.6.24)"
+Set file capabilities.
+.TP
+.B CAP_SETPCAP
+If file capabilities are not supported:
+grant or remove any capability in the
+caller's permitted capability set to or from any other process.
+(This property of
+.B CAP_SETPCAP
+is not available when the kernel is configured to support
+file capabilities, since
.B CAP_SETPCAP
-Grant or remove any capability in the caller's
-permitted capability set to or from any other process.
+has entirely different semantics for such kernels.)
+
+If file capabilities are supported:
+add any capability from the calling thread's bounding set
+to its inheritable set;
+drop capabilities from the bounding set (via
+.BR prctl (2)
+.BR PR_CAPBSET_DROP );
+make changes to the
+.I securebits
+flags.
.TP
.B CAP_SETUID
-Allow arbitrary manipulations of process UIDs
+Make arbitrary manipulations of process UIDs
.RB ( setuid (2),
.BR setreuid (2),
.BR setresuid (2),
.BR setfsuid (2));
-allow forged UID when passing socket credentials via Unix domain sockets.
+make forged UID when passing socket credentials via Unix domain sockets.
.\" FIXME CAP_SETUID also an effect in exec(); document this.
.TP
.B CAP_SYS_ADMIN
-Permit a range of system administration operations including:
+.PD 0
+.RS
+.IP * 2
+Perform a range of system administration operations including:
.BR quotactl (2),
.BR mount (2),
.BR umount (2),
.BR swapon (2),
.BR swapoff (2),
.BR sethostname (2),
-.BR setdomainname (2),
+.BR setdomainname (2);
+.IP *
+perform
.B IPC_SET
and
.B IPC_RMID
operations on arbitrary System V IPC objects;
+.IP *
perform operations on
.I trusted
and
.I security
Extended Attributes (see
.BR attr (5));
-call
+.IP *
+use
.BR lookup_dcookie (2);
+.IP *
use
.BR ioprio_set (2)
to assign
@@ -202,13 +262,16 @@ to assign
and (before Linux 2.6.25)
.B IOPRIO_CLASS_IDLE
I/O scheduling classes;
+.IP *
perform
.BR keyctl (2)
.B KEYCTL_CHOWN
and
.B KEYCTL_SETPERM
-operations.
-allow forged UID when passing socket credentials;
+operations;
+.IP *
+forge UID when passing socket credentials;
+.IP *
exceed
.IR /proc/sys/fs/file-max ,
the system-wide limit on the number of open files,
@@ -216,81 +279,98 @@ in system calls that open files (e.g.,
.BR accept (2),
.BR execve (2),
.BR open (2),
-.BR pipe (2);
-without this capability these system calls will fail with the error
+.BR pipe (2)
+(without this capability these system calls will fail with the error
.B ENFILE
if this limit is encountered);
+.IP *
employ
.B CLONE_NEWNS
flag with
.BR clone (2)
and
.BR unshare (2);
+.IP *
perform
.B KEYCTL_CHOWN
and
.B KEYCTL_SETPERM
.BR keyctl (2)
operations.
+.RE
+.PD
.TP
.B CAP_SYS_BOOT
-Permit calls to
+Use
.BR reboot (2)
and
.BR kexec_load (2).
.TP
.B CAP_SYS_CHROOT
-Permit calls to
+Use
.BR chroot (2).
.TP
.B CAP_SYS_MODULE
-Allow loading and unloading of kernel modules;
-allow modifications to capability bounding set (see
+Load and unload kernel modules
+(see
.BR init_module (2)
and
-.BR delete_module (2)).
+.BR delete_module (2));
+in kernels before 2.6.25:
+drop capabilities from the system-wide capability bounding set.
.TP
.B CAP_SYS_NICE
-Allow raising process nice value
+.PD 0
+.RS
+.IP * 2
+Raise process nice value
.RB ( nice (2),
.BR setpriority (2))
-and changing of the nice value for arbitrary processes;
-allow setting of real-time scheduling policies for calling process,
-and setting scheduling policies and priorities for arbitrary processes
+and change the nice value for arbitrary processes;
+.IP *
+set real-time scheduling policies for calling process,
+and set scheduling policies and priorities for arbitrary processes
.RB ( sched_setscheduler (2),
.BR sched_setparam (2));
+.IP *
set CPU affinity for arbitrary processes
.RB ( sched_setaffinity (2));
+.IP *
set I/O scheduling class and priority for arbitrary processes
.RB ( ioprio_set (2));
-allow
+.IP *
+apply
.BR migrate_pages (2)
-to be applied to arbitrary processes and allow processes
+to arbitrary processes and allow processes
to be migrated to arbitrary nodes;
.\" FIXME CAP_SYS_NICE also has the following effect for
.\" migrate_pages(2):
.\" do_migrate_pages(mm, &old, &new,
.\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
-allow
+.IP *
+apply
.BR move_pages (2)
-to be applied to arbitrary processes;
+to arbitrary processes;
+.IP *
use the
.B MPOL_MF_MOVE_ALL
flag with
.BR mbind (2)
and
.BR move_pages (2).
+.RE
+.PD
.TP
.B CAP_SYS_PACCT
-Permit calls to
+Use
.BR acct (2).
.TP
.B CAP_SYS_PTRACE
-Allow arbitrary processes to be traced using
+Trace arbitrary processes using
.BR ptrace (2)
.TP
.B CAP_SYS_RAWIO
-Permit I/O port operations
+Perform I/O port operations
.RB ( iopl (2)
and
.BR ioperm (2));
@@ -298,52 +378,91 @@ access
.IR /proc/kcore .
.TP
.B CAP_SYS_RESOURCE
-Permit: use of reserved space on ext2 file systems;
+.PD 0
+.RS
+.IP * 2
+Use reserved space on ext2 file systems;
+.IP *
+make
.BR ioctl (2)
calls controlling ext3 journaling;
-disk quota limits to be overridden;
-resource limits to be increased (see
+.IP *
+override disk quota limits;
+.IP *
+increase resource limits (see
.BR setrlimit (2));
+.IP *
+override
.B RLIMIT_NPROC
-resource limit to be overridden;
+resource limit;
+.IP *
+raise
.I msg_qbytes
-limit for a message queue to be
-raised above the limit in
+limit for a System V message queue above the limit in
.I /proc/sys/kernel/msgmnb
(see
.BR msgop (2)
and
.BR msgctl (2).
+.RE
+.PD
.TP
.B CAP_SYS_TIME
-Allow modification of system clock
+Set system clock
.RB ( settimeofday (2),
.BR stime (2),
.BR adjtimex (2));
-allow modification of real-time (hardware) clock
+set real-time (hardware) clock.
.TP
.B CAP_SYS_TTY_CONFIG
-Permit calls to
+Use
.BR vhangup (2).
-.SS Capability Sets
+.\"
+.SS Past and Current Implementation
+A full implementation of capabilities requires that:
+.IP 1. 3
+For all privileged operations,
+the kernel must check whether the thread has the required
+capability in its effective set.
+.IP 2.
+The kernel must provide
+system calls allowing a thread's capability sets to
+be changed and retrieved.
+.IP 3.
+The file system must support attaching capabilities to an executable file,
+so that a process gains those capabilities when the file is executed.
+.PP
+Before kernel 2.6.24, only the first two of these requirements are met;
+since kernel 2.6.24, all three requirements are met.
+.\"
+.SS Thread Capability Sets
Each thread has three capability sets containing zero or more
of the above capabilities:
.TP
-.IR Effective :
-the capabilities used by the kernel to
-perform permission checks for the thread.
-.TP
.IR Permitted :
-the capabilities that the thread may assume
-(i.e., a limiting superset for the effective and inheritable sets).
+This is a limiting superset for the effective
+capabilities that the thread may assume.
+It is also a limiting superset for the capabilities that
+may be added to the inheritable set by a thread that does not have the
+.B CAP_SETPCAP
+capability in its effective set.
+
If a thread drops a capability from its permitted set,
it can never re-acquire that capability (unless it
.BR execve (2)s
-a set-user-ID-root program).
+either a set-user-ID-root program, or
+a program whose associated file capabilities grant that capability).
.TP
-.IR inheritable :
-the capabilities preserved across an
+.IR Inheritable :
+This is a set of capabilities preserved across an
.BR execve (2).
+It provides a mechanism for a process to assign capabilities
+to the permitted set of the new program during an
+.BR execve (2).
+.TP
+.IR Effective :
+This is the set of capabilities used by the kernel to
+perform permission checks for the thread.
.PP
A child created via
.BR fork (2)
@@ -353,87 +472,62 @@ See below for a discussion of the treatment of capabilities during
.PP
Using
.BR capset (2),
-a thread may manipulate its own capability sets, or, if it has the
-.B CAP_SETPCAP
-capability, those of a thread in another process.
-.SS Capability bounding set
-When a program is execed, the permitted and effective capabilities
-are ANDed with the current value of the so-called
-.IR "capability bounding set" ,
-defined in the file
-.IR /proc/sys/kernel/cap-bound .
-This parameter can be used to place a system-wide limit on the
-capabilities granted to all subsequently executed programs.
-(Confusingly, this bit mask parameter is expressed as a
-signed decimal number in
-.IR /proc/sys/kernel/cap-bound .)
-
-Only the
-.B init
-process may set bits in the capability bounding set;
-other than that, the superuser may only clear bits in this set.
-
-On a standard system the capability bounding set always masks out the
-.B CAP_SETPCAP
+a thread may manipulate its own capability sets (see below).
+.\"
+.SS File Capabilities
+Since kernel 2.6.24, the kernel supports
+associating capability sets with an executable file using
+.BR setcap (8).
+The file capability sets are stored in an extended attribute (see
+.BR setxattr (2))
+named
+.IR "security.capability" .
+Writing to this extended attribute requires the
+.BR CAP_SETFCAP
capability.
-To remove this restriction (dangerous!), modify the definition of
-.B CAP_INIT_EFF_SET
-in
-.I include/linux/capability.h
-and rebuild the kernel.
-
-The capability bounding set feature was added to Linux starting with
-kernel version 2.2.11.
-.SS Current and Future Implementation
-A full implementation of capabilities requires:
-.IP 1. 4
-that for all privileged operations,
-the kernel check whether the thread has the required
-capability in its effective set.
-.IP 2. 4
-that the kernel provide
-system calls allowing a thread's capability sets to
-be changed and retrieved.
-.IP 3. 4
-file system support for attaching capabilities to an executable file,
-so that a process gains those capabilities when the file is execed.
-.PP
-As at Linux 2.6.14, only the first two of these requirements are met.
-
-Eventually, it should be possible to associate three
-capability sets with an executable file, which,
+The file capability sets,
in conjunction with the capability sets of the thread,
-will determine the capabilities of a thread after an
-.BR execve (2):
-.TP
-.IR Inheritable " (formerly known as " allowed ):
-this set is ANDed with the thread's inheritable set to determine which
-inheritable capabilities are permitted to the thread after the
+determine the capabilities of a thread after an
.BR execve (2).
+
+The three file capability sets are:
.TP
.IR Permitted " (formerly known as " forced ):
-the capabilities automatically permitted to the thread,
+These capabilities are automatically permitted to the thread,
regardless of the thread's inheritable capabilities.
.TP
+.IR Inheritable " (formerly known as " allowed ):
+This set is ANDed with the thread's inheritable set to determine which
+inheritable capabilities are enabled in the permitted set of
+the thread after the
+.BR execve (2).
+.TP
.IR Effective :
-those capabilities in the thread's new permitted set are
-also to be set in the new effective set.
-(F(effective) would normally be either all zeros or all ones.)
-.PP
-In the meantime, since the current implementation does not
-support file capability sets, during an
-.BR execve (2):
-.IP 1. 4
-All three file capability sets are initially assumed to be cleared.
-.IP 2. 4
-If a set-user-ID-root program is being execed,
-or the real user ID of the process is 0 (root)
-then the file inheritable and permitted sets are defined to be all ones
-(i.e., all capabilities enabled).
-.IP 3. 4
-If a set-user-ID-root program is being executed,
-then the file effective set is defined to be all ones.
-.SS Transformation of Capabilities During exec()
+This is not a set, but rather just a single bit.
+If this bit is set, then during an
+.BR execve (2)
+all of the new permitted capabilities for the thread are
+also raised in the effective set.
+If this bit is not set, then after an
+.BR execve (2),
+none of the new permitted capabilities is in the new effective set.
+
+Enabling the file effective capability bit implies
+that any file permitted or inheritable capability that causes a
+thread to acquire the corresponding permitted capability during an
+.BR execve (2)
+(see the transormation rules described below) will also acquire that
+capability in its effective set.
+Therefore, when assigning capabilities to a file
+.RB ( setcap (8),
+.BR cap_set_file (3),
+.BR cap_set_fd (3)),
+if we specify the effective flag as being enabled for any capability,
+then the effective flag must also be specified as enabled
+for all other capabilities for which the corresponding permitted or
+inheritable flags is enabled.
+.\"
+.SS Transformation of Capabilities During execve()
.PP
During an
.BR execve (2),
@@ -445,38 +539,163 @@ the process using the following algorithm:
P'(permitted) = (P(inheritable) & F(inheritable)) |
(F(permitted) & cap_bset)
-P'(effective) = P'(permitted) & F(effective)
+P'(effective) = F(effective) ? P'(permitted) : 0
P'(inheritable) = P(inheritable) [i.e., unchanged]
.fi
.in
where:
+.RS 4
.IP P 10
denotes the value of a thread capability set before the
.BR execve (2)
-.IP P' 10
+.IP P'
denotes the value of a capability set after the
.BR execve (2)
-.IP F 10
+.IP F
denotes a file capability set
-.IP cap_bset 10
-is the value of the capability bounding set.
+.IP cap_bset
+is the value of the capability bounding set (described below).
+.RE
+.\"
+.SS Capabilities and execution of programs by root
+In order to provide an all-powerful
+.I root
+using capability sets, during an
+.BR execve (2):
+.IP 1. 3
+If a set-user-ID-root program is being executed,
+or the real user ID of the process is 0 (root)
+then the file inheritable and permitted sets are defined to be all ones
+(i.e., all capabilities enabled).
+.IP 2.
+If a set-user-ID-root program is being executed,
+then the file effective bit is defined to be one (enabled).
.PP
-In the current implementation, the upshot of this algorithm is that
-when a process
+The upshot of the above rules,
+combined with the capabilities transformations described above,
+is that when a process
.BR execve (2)s
a set-user-ID-root program, or when a process with an effective UID of 0
.BR execve (2)s
a program,
it gains all capabilities in its permitted and effective capability sets,
-except those masked out by the capability bounding set (i.e.,
-.BR CAP_SETPCAP ).
+except those masked out by the capability bounding set.
.\" If a process with real UID 0, and non-zero effective UID does an
-.\" exec(), then it gets all capabilities (less CAP_SETPCAP) in its
+.\" exec(), then it gets all capabilities in its
.\" permitted set, and no effective capabilities
This provides semantics that are the same as those provided by
traditional Unix systems.
+.SS Capability bounding set
+The capability bounding set is a security mechanism that can be used
+to limit the capabilities that can be gained during an
+.BR execve (2).
+The bounding set is used in the following ways:
+.IP * 2
+During an
+.BR execve (2),
+the capability bounding set is ANDed with the file permitted
+capability set, and the result of this operation is assigned to the
+thread's permitted capability set.
+The capability bounding set thus places a limit on the permitted
+capabilities that may be granted by an executable file.
+.IP *
+(Since Linux 2.6.25)
+The capability bounding set acts as a limiting superset for
+the capabilities that a thread can add to its inheritable set using
+.BR capset (2).
+This means that if the capability is not in the bounding set,
+then a thread can't add one of its permitted capabilities to its
+inheritable set and thereby have that capability preserved in its
+permitted set when it
+.BR execve (2)s
+a file that has the capability in its inheritable set.
+.PP
+Note that the bounding set masks the file permitted capabilities,
+but not the inherited capabilities.
+If a thread maintains a capability in its inherited set
+that is not in its bounding set,
+then it can still gain that capability in its permitted set
+by executing a file that has the capability in its inherited set.
+.PP
+Depending on the kernel version, the capability bounding set is either
+a system-wide attribute, or a per-process attribute.
+.PP
+.B "Capability bounding set prior to Linux 2.6.25"
+.PP
+In kernels before 2.6.25, the capability bounding set is a system-wide
+attribute that affects all threads on the system.
+The bounding set is accessible via the file
+.IR /proc/sys/kernel/cap-bound .
+(Confusingly, this bit mask parameter is expressed as a
+signed decimal number in
+.IR /proc/sys/kernel/cap-bound .)
+
+Only the
+.B init
+process may set capabilities in the capability bounding set;
+other than that, the superuser (more precisely: programs with the
+.B CAP_SYS_MODULE
+capability) may only clear capabilities from this set.
+
+On a standard system the capability bounding set always masks out the
+.B CAP_SETPCAP
+capability.
+To remove this restriction (dangerous!), modify the definition of
+.B CAP_INIT_EFF_SET
+in
+.I include/linux/capability.h
+and rebuild the kernel.
+
+The system-wide capability bounding set feature was added
+to Linux starting with kernel version 2.2.11.
+.\"
+.PP
+.B "Capability bounding set from Linux 2.6.25 onwards"
+.PP
+From Linux 2.6.25, the
+.I "capability bounding set"
+is a per-thread attribute.
+(There is no longer a system-wide capability bounding set.)
+
+The bounding set is inherited at
+.BR fork (2)
+from the thread's parent, and is preserved across an
+.BR execve (2).
+
+A thread may remove capabilities from its capability bounding set using the
+.BR prctl (2)
+.B PR_CAPBSET_DROP
+operation, provided it has the
+.B CAP_SETPCAP
+capability.
+Once a capability has been dropped from the bounding set,
+it cannot be restored to that set.
+A thread can determine if a capability is in its bounding set using the
+.BR prctl (2)
+.B PR_CAPBSET_READ
+operation.
+
+Removing capabilities from the bounding set is only supported if file
+capabilities are compiled into the kernel
+(CONFIG_SECURITY_FILE_CAPABILITIES).
+In that case, the
+.B init
+process (the ancestor of all processes) begins with a full bounding set.
+If file capabilities are not compiled into the kernel, then
+.B init
+begins with a full bounding set minus
+.BR CAP_SETPCAP ,
+because this capability has a different meaning when there are
+no file capabilities.
+
+Removing a capability from the bounding set does not remove it
+from the thread's inherited set.
+However it does prevent the capability from being added
+back into the thread's inherited set in the future.
+.\"
+.\"
.SS Effect of User ID Changes on Capabilities
To preserve the traditional semantics for transitions between
0 and non-zero user IDs,
@@ -486,19 +705,19 @@ and file system user IDs (using
.BR setuid (2),
.BR setresuid (2),
or similar):
-.IP 1. 4
+.IP 1. 3
If one or more of the real, effective or saved set user IDs
was previously 0, and as a result of the UID changes all of these IDs
have a non-zero value,
then all capabilities are cleared from the permitted and effective
capability sets.
-.IP 2. 4
+.IP 2.
If the effective user ID is changed from 0 to non-zero,
then all capabilities are cleared from the effective set.
-.IP 3. 4
+.IP 3.
If the effective user ID is changed from non-zero to 0,
then the permitted set is copied to the effective set.
-.IP 4. 4
+.IP 4.
If the file system user ID is changed from 0 to non-zero (see
.BR setfsuid (2))
then the following capabilities are cleared from the effective set:
@@ -506,8 +725,9 @@ then the following capabilities are cleared from the effective set:
.BR CAP_DAC_OVERRIDE ,
.BR CAP_DAC_READ_SEARCH ,
.BR CAP_FOWNER ,
+.BR CAP_FSETID ,
and
-.BR CAP_FSETID .
+.BR CAP_MAC_OVERRIDE .
If the file system UID is changed from non-zero to 0,
then any of these capabilities that are enabled in the permitted set
are enabled in the effective set.
@@ -518,10 +738,140 @@ all of its user IDs to non-zero values, it can do so using the
.BR prctl (2)
.B PR_SET_KEEPCAPS
operation.
+.\"
+.SS Programmatically adjusting capability sets
+A thread can retrieve and change its capability sets using the
+.BR capget (2)
+and
+.BR capset (2)
+system calls.
+However, the use of
+.BR cap_get_proc (3)
+and
+.BR cap_set_proc (3),
+both provided in the
+.I libcap
+package,
+is preferred for this purpose.
+The following rules govern changes to the thread capability sets:
+.IP 1. 3
+If the caller does not have the
+.B CAP_SETPCAP
+capability,
+the new inheritable set must be a subset of the combination
+of the existing inheritable and permitted sets.
+.IP 2.
+(Since kernel 2.6.25)
+The new inheritable set must be a subset of the combination of the
+existing inheritable set and the capability bounding set.
+.IP 3.
+The new permitted set must be a subset of the existing permitted set
+(i.e., it is not possible to acquire permitted capabilities
+that the thread does not currently have).
+.IP 4.
+The new effective set must be a subset of the new permitted set.
+.SS The """securebits"" flags: establishing a capabilities-only environment
+.\" For some background:
+.\" see http://lwn.net/Articles/280279/ and
+.\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/
+Starting with kernel 2.6.26,
+and with a kernel in which file capabilities are enabled,
+Linux implements a set of per-thread
+.I securebits
+flags that can be used to disable special handling of capabilities for UID 0
+.RI ( root ).
+These flags are as follows:
+.TP
+.B SECURE_KEEP_CAPS
+Setting this flag allows a thread that has one or more 0 UIDs to retain
+its capabilities when it switches all of its UIDs to a non-zero value.
+If this flag is not set,
+then such a UID switch causes the thread to lose all capabilities.
+This flag is always cleared on an
+.BR execve (2).
+(This flag provides the same functionality as the older
+.BR prctl (2)
+.B PR_SET_KEEPCAPS
+operation.)
+.TP
+.B SECURE_NO_SETUID_FIXUP
+Setting this flag stops the kernel from adjusting capability sets when
+the threads's effective and file system UIDs are switched between
+zero and non-zero values.
+(See the subsection
+.IR "Effect of User ID Changes on Capabilities" .)
+.TP
+.B SECURE_NOROOT
+If this bit is set, then the kernel does not grant capabilities
+when a set-user-ID-root program is executed, or when a process with
+an effective or real UID of 0 calls
+.BR execve (2).
+(See the subsection
+.IR "Capabilities and execution of programs by root" .)
+.PP
+Each of the above "base" flags has a companion "locked" flag.
+Setting any of the "locked" flags is irreversible,
+and has the effect of preventing further changes to the
+corresponding "base" flag.
+The locked flags are:
+.BR SECURE_KEEP_CAPS_LOCKED ,
+.BR SECURE_NO_SETUID_FIXUP_LOCKED ,
+and
+.BR SECURE_NOROOT_LOCKED .
+.PP
+The
+.I securebits
+flags can be modified and retrieved using the
+.BR prctl (2)
+.B PR_SET_SECUREBITS
+and
+.B PR_GET_SECUREBITS
+operations.
+The
+.B CAP_SETPCAP
+capability is required to modify the flags.
+
+The
+.I securebits
+flags are inherited by child processes.
+During an
+.BR execve (2),
+all of the flags are preserved, except
+.B SECURE_KEEP_CAPS
+which is always cleared.
+
+An application can use the following call to lock itself,
+and all of its descendants,
+into an environment where the only way of gaining capabilities
+is by executing a program with associated file capabilities:
+.in +4n
+.nf
+
+prctl(PR_SET_SECUREBITS,
+ 1 << SECURE_KEEP_CAPS_LOCKED |
+ 1 << SECURE_NO_SETUID_FIXUP |
+ 1 << SECURE_NO_SETUID_FIXUP_LOCKED |
+ 1 << SECURE_NOROOT |
+ 1 << SECURE_NOROOT_LOCKED);
+.fi
+.in
.SH "CONFORMING TO"
+.PP
No standards govern capabilities, but the Linux capability implementation
-is based on the withdrawn POSIX.1e draft standard.
+is based on the withdrawn POSIX.1e draft standard; see
+.IR http://wt.xpilot.org/publications/posix.1e/ .
.SH NOTES
+Since kernel 2.5.27, capabilities are an optional kernel component,
+and can be enabled/disabled via the CONFIG_SECURITY_CAPABILITIES
+kernel configuration option.
+
+The
+.I /proc/PID/task/TID/status
+file can be used to view the capability sets of a thread.
+The
+.I /proc/PID/status
+file shows the capability sets of a process's main thread.
+
The
.I libcap
package provides a suite of routines for setting and
@@ -530,12 +880,50 @@ to change than the interface provided by
.BR capset (2)
and
.BR capget (2).
-.SH BUGS
-There is as yet no file system support allowing capabilities to be
-associated with executable files.
+This package also provides the
+.BR setcap (8)
+and
+.BR getcap (8)
+programs.
+It can be found at
+.br
+.IR http://www.kernel.org/pub/linux/libs/security/linux-privs .
+
+Before kernel 2.6.24, and since kernel 2.6.24 if
+file capabilities are not enabled, a thread with the
+.B CAP_SETPCAP
+capability can manipulate the capabilities of threads other than itself.
+However, this is only theoretically possible,
+since no thread ever has
+.BR CAP_SETPCAP
+in either of these cases:
+.IP * 2
+In the pre-2.6.25 implementation the system-wide capability bounding set,
+.IR /proc/sys/kernel/cap-bound ,
+always masks out this capability, and this can not be changed
+without modifying the kernel source and rebuilding.
+.IP *
+If file capabilities are disabled in the current implementation, then
+.B init
+starts out with this capability removed from its per-process bounding
+set, and that bounding set is inherited by all other processes
+created on the system.
.SH "SEE ALSO"
.BR capget (2),
.BR prctl (2),
.BR setfsuid (2),
+.BR cap_from_text (3),
+.BR cap_clear (3),
+.BR cap_get_file (3),
+.BR cap_get_proc (3),
+.BR cap_init (3),
+.BR cap_copy_ext (3),
+.BR capgetp (3),
+.BR capsetp (3),
.BR credentials (7),
-.BR pthreads (7)
+.BR pthreads (7),
+.BR getcap (8),
+.BR setcap (8)
+.PP
+.I include/linux/capability.h
+in the kernel source