aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--man2/ioctl_userfaultfd.2142
-rw-r--r--man2/userfaultfd.295
2 files changed, 215 insertions, 22 deletions
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 15a681164c..d0cb0c9c8e 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -197,6 +197,16 @@ memory accesses to the regions registered with userfaultfd.
If this feature bit is set,
.I uffd_msg.pagefault.feat.ptid
will be set to the faulted thread ID for each page-fault message.
+.TP
+.BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
+If this feature bit is set,
+the kernel supports registering userfaultfd ranges
+in minor mode on hugetlbfs-backed memory areas.
+.TP
+.BR UFFD_FEATURE_MINOR_SHMEM " (since Linux 5.14)"
+If this feature bit is set,
+the kernel supports registering userfaultfd ranges
+in minor mode on shmem-backed memory areas.
.PP
The returned
.I ioctls
@@ -256,14 +266,8 @@ by the current kernel version.
(Since Linux 4.3.)
Register a memory address range with the userfaultfd object.
The pages in the range must be "compatible".
-.PP
-Up to Linux kernel 4.11,
-only private anonymous ranges are compatible for registering with
-.BR UFFDIO_REGISTER .
-.PP
-Since Linux 4.11,
-hugetlbfs and shared memory ranges are also compatible with
-.BR UFFDIO_REGISTER .
+Please refer to the list of register modes below
+for the compatible memory backends for each mode.
.PP
The
.I argp
@@ -302,9 +306,22 @@ the specified range:
.TP
.B UFFDIO_REGISTER_MODE_MISSING
Track page faults on missing pages.
+Since Linux 4.3,
+only private anonymous ranges are compatible.
+Since Linux 4.11,
+hugetlbfs and shared memory ranges are also compatible.
.TP
.B UFFDIO_REGISTER_MODE_WP
Track page faults on write-protected pages.
+Since Linux 5.7,
+only private anonymous ranges are compatible.
+.TP
+.B UFFDIO_REGISTER_MODE_MINOR
+Track minor page faults.
+Since Linux 5.13,
+only hugetlbfs ranges are compatible.
+Since Linux 5.14,
+compatiblity with shmem ranges was added.
.PP
If the operation is successful, the kernel modifies the
.I ioctls
@@ -331,6 +348,11 @@ The
The
.B UFFDIO_ZEROPAGE
operation is supported.
+.TP
+.B 1 << _UFFDIO_CONTINUE
+The
+.B UFFDIO_CONTINUE
+operation is supported.
.PP
This
.BR ioctl (2)
@@ -731,6 +753,110 @@ or not registered with userfaultfd write-protect mode.
.TP
.B EFAULT
Encountered a generic fault during processing.
+.\"
+.SS UFFDIO_CONTINUE
+(Since Linux 5.13.)
+Resolve a minor page fault
+by installing page table entries
+for existing pages in the page cache.
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_continue
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_continue {
+ struct uffdio_range range; /* Range to install PTEs for and continue */
+ __u64 mode; /* Flags controlling the behavior of continue */
+ __s64 mapped; /* Number of bytes mapped, or negated error */
+};
+.EE
+.in
+.PP
+The following value may be bitwise ORed in
+.IR mode
+to change the behavior of the
+.B UFFDIO_CONTINUE
+operation:
+.TP
+.B UFFDIO_CONTINUE_MODE_DONTWAKE
+Do not wake up the thread that waits for page-fault resolution.
+.PP
+The
+.I mapped
+field is used by the kernel
+to return the number of bytes that were actually mapped,
+or an error in the same manner as
+.BR UFFDIO_COPY .
+If the value returned in the
+.I mapped
+field doesn't match the value that was specified in
+.IR range.len ,
+the operation fails with the error
+.BR EAGAIN .
+The
+.I mapped
+field is output-only;
+it is not read by the
+.B UFFDIO_CONTINUE
+operation.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+In this case,
+the entire area was mapped.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EAGAIN
+The number of bytes mapped
+(i.e., the value returned in the
+.I mapped
+field)
+does not equal the value that was specified in the
+.I range.len
+field.
+.TP
+.B EINVAL
+Either
+.I range.start
+or
+.I range.len
+was not a multiple of the system page size; or
+.I range.len
+was zero; or the range specified was invalid.
+.TP
+.B EINVAL
+An invalid bit was specified in the
+.IR mode
+field.
+.TP
+.B EEXIST
+One or more pages were already mapped in the given range.
+.TP
+.B ENOENT
+The faulting process has changed its virtual memory layout simultaneously with
+an outstanding
+.B UFFDIO_CONTINUE
+operation.
+.TP
+.B ENOMEM
+Allocating memory needed to setup the page table mappings failed.
+.TP
+.B EFAULT
+No existing page could be found in the page cache for the given range.
+.TP
+.BR ESRCH
+The faulting process has exited at the time of a
+.B UFFDIO_CONTINUE
+operation.
+.\"
.SH RETURN VALUE
See descriptions of the individual operations, above.
.SH ERRORS
diff --git a/man2/userfaultfd.2 b/man2/userfaultfd.2
index 41741b4d88..f8dc4766b1 100644
--- a/man2/userfaultfd.2
+++ b/man2/userfaultfd.2
@@ -65,7 +65,7 @@ all memory ranges that were registered with the object are unregistered
and unread events are flushed.
.\"
.PP
-Userfaultfd supports two modes of registration:
+Userfaultfd supports three modes of registration:
.TP
.BR UFFDIO_REGISTER_MODE_MISSING " (since 4.10)"
When registered with
@@ -79,6 +79,18 @@ or an
.B UFFDIO_ZEROPAGE
ioctl.
.TP
+.BR UFFDIO_REGISTER_MODE_MINOR " (since 5.13)"
+When registered with
+.B UFFDIO_REGISTER_MODE_MINOR
+mode, user-space will receive a page-fault notification
+when a minor page fault occurs.
+That is, when a backing page is in the page cache, but
+page table entries don't yet exist.
+The faulted thread will be stopped from execution
+until the page fault is resolved from user-space by an
+.B UFFDIO_CONTINUE
+ioctl.
+.TP
.BR UFFDIO_REGISTER_MODE_WP " (since 5.7)"
When registered with
.B UFFDIO_REGISTER_MODE_WP
@@ -199,9 +211,10 @@ a page fault occurring in the requested memory range, and satisfying
the mode defined at the registration time, will be forwarded by the kernel to
the user-space application.
The application can then use the
-.B UFFDIO_COPY
+.B UFFDIO_COPY ,
+.B UFFDIO_ZEROPAGE ,
or
-.B UFFDIO_ZEROPAGE
+.B UFFDIO_CONTINUE
.BR ioctl (2)
operations to resolve the page fault.
.PP
@@ -305,6 +318,59 @@ should have the flag
cleared upon the faulted page or range.
.PP
Write-protect mode supports only private anonymous memory.
+.\"
+.SS Userfaultfd minor fault mode (since 5.13)
+Since Linux 5.13,
+userfaultfd supports minor fault mode.
+In this mode,
+fault messages are produced not for major faults
+(where the page was missing),
+but rather for minor faults,
+where a page exists in the page cache,
+but the page table entries are not yet present.
+The user needs to first check availability of this feature using the
+.B UFFDIO_API
+ioctl with the appropriate feature bits set before using this feature:
+.B UFFD_FEATURE_MINOR_HUGETLBFS
+since Linux 5.13,
+or
+.B UFFD_FEATURE_MINOR_SHMEM
+since Linux 5.14.
+.PP
+To register with userfaultfd minor fault mode,
+the user needs to initiate the
+.B UFFDIO_REGISTER
+ioctl with mode
+.B UFFD_REGISTER_MODE_MINOR
+set.
+.PP
+When a minor fault occurs,
+user-space will receive a page-fault notification
+whose
+.I uffd_msg.pagefault.flags
+will have the
+.B UFFD_PAGEFAULT_FLAG_MINOR
+flag set.
+.PP
+To resolve a minor page fault,
+the handler should decide whether or not
+the existing page contents need to be modified first.
+If so,
+this should be done in-place via a second,
+non-userfaultfd-registered mapping
+to the same backing page
+(e.g., by mapping the shmem or hugetlbfs file twice).
+Once the page is considered "up to date",
+the fault can be resolved by initiating an
+.B UFFDIO_CONTINUE
+ioctl,
+which installs the page table entries and
+(by default)
+wakes up the faulting thread(s).
+.PP
+Minor fault mode supports only hugetlbfs-backed (since Linux 5.13)
+and shmem-backed (since Linux 5.14) memory.
+.\"
.SS Reading from the userfaultfd structure
Each
.BR read (2)
@@ -443,19 +509,20 @@ For
the following flag may appear:
.RS
.TP
-.B UFFD_PAGEFAULT_FLAG_WRITE
-If the address is in a range that was registered with the
-.B UFFDIO_REGISTER_MODE_MISSING
-flag (see
-.BR ioctl_userfaultfd (2))
-and this flag is set, this a write fault;
-otherwise it is a read fault.
+.B UFFD_PAGEFAULT_FLAG_WP
+If this flag is set, then the fault was a write-protect fault.
+.TP
+.B UFFD_PAGEFAULT_FLAG_MINOR
+If this flag is set, then the fault was a minor fault.
.TP
+.B UFFD_PAGEFAULT_FLAG_WRITE
+If this flag is set, then the fault was a write fault.
+.PP
+If neither
.B UFFD_PAGEFAULT_FLAG_WP
-If the address is in a range that was registered with the
-.B UFFDIO_REGISTER_MODE_WP
-flag, when this bit is set, it means it is a write-protect fault.
-Otherwise it is a page-missing fault.
+nor
+.B UFFD_PAGEFAULT_FLAG_MINOR
+are set, then the fault was a missing fault.
.RE
.TP
.I pagefault.feat.pid