aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMichael Kerrisk <mtk.manpages@gmail.com>2018-01-09 00:19:02 +0100
committerMichael Kerrisk <mtk.manpages@gmail.com>2018-01-10 00:35:47 +0100
commited3f4f34fc3a80db97256f9f283e274078bbfc31 (patch)
tree95cb4f747ec6f088d43cdc4269170e154c343780
parent148e0800eb0a7dc77078ef94306f16af55a40e40 (diff)
downloadman-pages-ed3f4f34fc3a80db97256f9f283e274078bbfc31.tar.gz
cgroups.7: Document cgroup v2 delegation via the 'nsdelegate' mount option
Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
-rw-r--r--man7/cgroups.7100
1 files changed, 92 insertions, 8 deletions
diff --git a/man7/cgroups.7 b/man7/cgroups.7
index 0ed62a2fea..ccf9251cd2 100644
--- a/man7/cgroups.7
+++ b/man7/cgroups.7
@@ -493,14 +493,6 @@ the value in this file is inherited from the corresponding file
in the parent cgroup.
.\"
.SH CGROUPS VERSION 2
-.\" FIXME
-.\" Document the 'nsdelegate' mount option added in Linux 4.13
-.\" To test this, it can be useful to boot the kernel with the options:
-.\"
-.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
-.\"
-.\" The effect of th latter option is to prevent systemd from employing
-.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2.
In cgroups v2,
all mounted controllers reside in a single unified hierarchy.
While (different) controllers may be simultaneously
@@ -919,6 +911,93 @@ or the ownership of that file was passed to the delegatee,
the delegatee can also control the further redistribution
of the corresponding resources into the delegated subtree.
.\"
+.SS Cgroups v2 delegation: nsdelegate and cgroup namespaces
+.\"
+.\" To test this, it can be useful to boot the kernel with the options:
+.\"
+.\" cgroup_no_v1=all systemd.legacy_systemd_cgroup_controller
+.\"
+.\" The effect of the latter option is to prevent systemd from employing
+.\" its "hybrid" cgroup mode, where it tries to make use of cgroups v2.
+.\"
+Starting with Linux 4.13,
+.\" commit 5136f6365ce3eace5a926e10f16ed2a233db5ba9
+there is a second way to perform cgroup delegation.
+This is done by mounting the cgroup v2 filesystem with the
+.I nsdelegate
+mount option:
+.PP
+.in +4n
+.EX
+$ mount -t cgroup2 -o nsdelegate none /sys/fs/cgroup/unified
+.EE
+.in
+.PP
+The effect of this option is to cause cgroup namespaces
+to automatically become delegation boundaries.
+More specifically,
+the following restrictions apply for processes inside the cgroup namespace:
+.IP * 3
+Writes to controller interface files in the root directory
+will fail with the error
+.BR EPERM .
+Processes inside the cgroup namespace can still write to delegatable
+files such as
+.IR cgroup.procs
+and
+.IR cgroup.subtree_control ,
+and can create subhierarchy underneath the root directory of
+the cgroup namespace.
+.IP *
+Attempts to migrate processes across the namespace boundary are denied
+(with the error
+.BR ENOENT ).
+Processes inside the cgroup namespace can still
+(subject to the containment rules described below)
+move processes between cgroups
+.I within
+the subhierarchy under the namespace root.
+.PP
+The ability to define cgroup namespaces as delegation boundaries
+makes cgroup namespaces more useful.
+To understand why, suppose that we already have one cgroup hierarchy
+that has been delegated to a nonprivileged user,
+.IR cecilia ,
+using the older delegation technique described above.
+Suppose further that
+.I cecilia
+wanted to further delegate a subhierarchy
+under the existing delegated hierarchy.
+(For example, the delegated hierarchy might be associated with
+an unprivileged container run by
+.IR cecilia .)
+Even if a cgroup namespace was employed,
+because both hierarchies are owned by the unprivileged user
+.IR cecilia ,
+the following illegitimate actions could be performed:
+.IP * 3
+A process in the inferior hierarchy could change the
+resource controller settings in the root directory of the that hierarchy.
+(These resource controller settings are intended to allow control to
+be exercised from the
+.I parent
+cgroup;
+a process inside the child cgroup should not be allowed to modify them.)
+.IP *
+A process inside the inferior hierarchy could move processes
+into and out of the inferior hierarchy if the cgroups in the
+superior hierarchy were somehow visible.
+.PP
+Employing the
+.I nsdelegate
+mount option prevents both of these possibilities.
+.PP
+The
+.I nsdelegate
+mount option only has an effect when performed in
+the initial mount namespace;
+in other mount namespaces, the option is silently ignored.
+.\"
.SS Cgroup v2 delegation containment rules
Some delegation
.IR "containment rules"
@@ -941,6 +1020,11 @@ file in the common ancestor of the source and destination cgroups.
(In some cases,
the common ancestor may be the source or destination cgroup itself.)
.IP *
+If the cgroup v2 filesystem was mounted with the
+.I nsdelegate
+option, the writer must be able to see the source and destination cgroup
+from its cgroup namespace.
+.IP *
Before Linux 4.11:
.\" commit 576dd464505fc53d501bb94569db76f220104d28
the effective UID of the writer (i.e., the delegatee) matches the