aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMichael Kerrisk <mtk.manpages@gmail.com>2014-04-28 12:25:55 +0200
committerMichael Kerrisk <mtk.manpages@gmail.com>2014-04-29 15:00:07 +0200
commit59c06be3f64fdeac239bb823428cb440406113d9 (patch)
tree948b0fc490d8bfbbe90dfadedac82a88fb16c440
parent17e5494f9bde745e98be7f00b123b7d7a9674e6c (diff)
downloadman-pages-59c06be3f64fdeac239bb823428cb440406113d9.tar.gz
sched.7: New page providing an overview of the scheduling APIs
Most of this content derives from sched_setscheduler(2). In preparation for adding a sched_setattr(2) page, it makes sense to isolate out this general content to a separate page that is referred to by the other scheduling pages. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
-rw-r--r--man7/sched.7486
1 files changed, 486 insertions, 0 deletions
diff --git a/man7/sched.7 b/man7/sched.7
new file mode 100644
index 0000000000..db2a1fa14d
--- /dev/null
+++ b/man7/sched.7
@@ -0,0 +1,486 @@
+.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
+.\" Various pieces from the old sched_setscheduler(2) page
+.\" Copyright (C) Tom Bjorkholm, Markus Kuhn & David A. Wheeler 1996-1999
+.\" and Copyright (C) 2007 Carsten Emde <Carsten.Emde@osadl.org>
+.\" and Copyright (C) 2008 Michael Kerrisk <mtk.manpages@gmail.com>
+.\"
+.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
+.\" This is free documentation; you can redistribute it and/or
+.\" modify it under the terms of the GNU General Public License as
+.\" published by the Free Software Foundation; either version 2 of
+.\" the License, or (at your option) any later version.
+.\"
+.\" The GNU General Public License's references to "object code"
+.\" and "executables" are to be interpreted as the output of any
+.\" document formatting or typesetting system, including
+.\" intermediate and printed output.
+.\"
+.\" This manual is distributed in the hope that it will be useful,
+.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
+.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+.\" GNU General Public License for more details.
+.\"
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
+.\"
+.\" Worth looking at: http://rt.wiki.kernel.org/index.php
+.\"
+.TH SCHED 7 2014-04-28 "Linux" "Linux Programmer's Manual"
+.SH NAME
+sched \- overview of scheduling APIs
+.SH DESCRIPTION
+.SS Scheduling policies
+The scheduler is the kernel component that decides which runnable thread
+will be executed by the CPU next.
+Each thread has an associated scheduling policy and a \fIstatic\fP
+scheduling priority, \fIsched_priority\fP; these are the settings
+that are modified by
+.BR sched_setscheduler ().
+The scheduler makes it decisions based on knowledge of the scheduling
+policy and static priority of all threads on the system.
+
+For threads scheduled under one of the normal scheduling policies
+(\fBSCHED_OTHER\fP, \fBSCHED_IDLE\fP, \fBSCHED_BATCH\fP),
+\fIsched_priority\fP is not used in scheduling
+decisions (it must be specified as 0).
+
+Processes scheduled under one of the real-time policies
+(\fBSCHED_FIFO\fP, \fBSCHED_RR\fP) have a
+\fIsched_priority\fP value in the range 1 (low) to 99 (high).
+(As the numbers imply, real-time threads always have higher priority
+than normal threads.)
+Note well: POSIX.1-2001 requires an implementation to support only a
+minimum 32 distinct priority levels for the real-time policies,
+and some systems supply just this minimum.
+Portable programs should use
+.BR sched_get_priority_min (2)
+and
+.BR sched_get_priority_max (2)
+to find the range of priorities supported for a particular policy.
+
+Conceptually, the scheduler maintains a list of runnable
+threads for each possible \fIsched_priority\fP value.
+In order to determine which thread runs next, the scheduler looks for
+the nonempty list with the highest static priority and selects the
+thread at the head of this list.
+
+A thread's scheduling policy determines
+where it will be inserted into the list of threads
+with equal static priority and how it will move inside this list.
+
+All scheduling is preemptive: if a thread with a higher static
+priority becomes ready to run, the currently running thread
+will be preempted and
+returned to the wait list for its static priority level.
+The scheduling policy determines the
+ordering only within the list of runnable threads with equal static
+priority.
+.SS SCHED_FIFO: First in-first out scheduling
+\fBSCHED_FIFO\fP can be used only with static priorities higher than
+0, which means that when a \fBSCHED_FIFO\fP threads becomes runnable,
+it will always immediately preempt any currently running
+\fBSCHED_OTHER\fP, \fBSCHED_BATCH\fP, or \fBSCHED_IDLE\fP thread.
+\fBSCHED_FIFO\fP is a simple scheduling
+algorithm without time slicing.
+For threads scheduled under the
+\fBSCHED_FIFO\fP policy, the following rules apply:
+.IP * 3
+A \fBSCHED_FIFO\fP thread that has been preempted by another thread of
+higher priority will stay at the head of the list for its priority and
+will resume execution as soon as all threads of higher priority are
+blocked again.
+.IP *
+When a \fBSCHED_FIFO\fP thread becomes runnable, it
+will be inserted at the end of the list for its priority.
+.IP *
+A call to
+.BR sched_setscheduler ()
+or
+.BR sched_setparam (2)
+will put the
+\fBSCHED_FIFO\fP (or \fBSCHED_RR\fP) thread identified by
+\fIpid\fP at the start of the list if it was runnable.
+As a consequence, it may preempt the currently running thread if
+it has the same priority.
+(POSIX.1-2001 specifies that the thread should go to the end
+of the list.)
+.\" In 2.2.x and 2.4.x, the thread is placed at the front of the queue
+.\" In 2.0.x, the Right Thing happened: the thread went to the back -- MTK
+.IP *
+A thread calling
+.BR sched_yield (2)
+will be put at the end of the list.
+.PP
+No other events will move a thread
+scheduled under the \fBSCHED_FIFO\fP policy in the wait list of
+runnable threads with equal static priority.
+
+A \fBSCHED_FIFO\fP
+thread runs until either it is blocked by an I/O request, it is
+preempted by a higher priority thread, or it calls
+.BR sched_yield (2).
+.SS SCHED_RR: Round-robin scheduling
+\fBSCHED_RR\fP is a simple enhancement of \fBSCHED_FIFO\fP.
+Everything
+described above for \fBSCHED_FIFO\fP also applies to \fBSCHED_RR\fP,
+except that each thread is allowed to run only for a maximum time
+quantum.
+If a \fBSCHED_RR\fP thread has been running for a time
+period equal to or longer than the time quantum, it will be put at the
+end of the list for its priority.
+A \fBSCHED_RR\fP thread that has
+been preempted by a higher priority thread and subsequently resumes
+execution as a running thread will complete the unexpired portion of
+its round-robin time quantum.
+The length of the time quantum can be
+retrieved using
+.BR sched_rr_get_interval (2).
+.\" On Linux 2.4, the length of the RR interval is influenced
+.\" by the process nice value -- MTK
+.\"
+.SS SCHED_OTHER: Default Linux time-sharing scheduling
+\fBSCHED_OTHER\fP can be used at only static priority 0.
+\fBSCHED_OTHER\fP is the standard Linux time-sharing scheduler that is
+intended for all threads that do not require the special
+real-time mechanisms.
+The thread to run is chosen from the static
+priority 0 list based on a \fIdynamic\fP priority that is determined only
+inside this list.
+The dynamic priority is based on the nice value (set by
+.BR nice (2)
+or
+.BR setpriority (2))
+and increased for each time quantum the thread is ready to run,
+but denied to run by the scheduler.
+This ensures fair progress among all \fBSCHED_OTHER\fP threads.
+.\"
+.SS SCHED_BATCH: Scheduling batch processes
+(Since Linux 2.6.16.)
+\fBSCHED_BATCH\fP can be used only at static priority 0.
+This policy is similar to \fBSCHED_OTHER\fP in that it schedules
+the thread according to its dynamic priority
+(based on the nice value).
+The difference is that this policy
+will cause the scheduler to always assume
+that the thread is CPU-intensive.
+Consequently, the scheduler will apply a small scheduling
+penalty with respect to wakeup behaviour,
+so that this thread is mildly disfavored in scheduling decisions.
+
+.\" The following paragraph is drawn largely from the text that
+.\" accompanied Ingo Molnar's patch for the implementation of
+.\" SCHED_BATCH.
+.\" commit b0a9499c3dd50d333e2aedb7e894873c58da3785
+This policy is useful for workloads that are noninteractive,
+but do not want to lower their nice value,
+and for workloads that want a deterministic scheduling policy without
+interactivity causing extra preemptions (between the workload's tasks).
+.\"
+.SS SCHED_IDLE: Scheduling very low priority jobs
+(Since Linux 2.6.23.)
+\fBSCHED_IDLE\fP can be used only at static priority 0;
+the process nice value has no influence for this policy.
+
+This policy is intended for running jobs at extremely low
+priority (lower even than a +19 nice value with the
+.B SCHED_OTHER
+or
+.B SCHED_BATCH
+policies).
+.\"
+.SS Resetting scheduling policy for child processes
+Since Linux 2.6.32, the
+.B SCHED_RESET_ON_FORK
+flag can be ORed in
+.I policy
+when calling
+.BR sched_setscheduler ().
+As a result of including this flag, children created by
+.BR fork (2)
+do not inherit privileged scheduling policies.
+This feature is intended for media-playback applications,
+and can be used to prevent applications evading the
+.BR RLIMIT_RTTIME
+resource limit (see
+.BR getrlimit (2))
+by creating multiple child processes.
+
+More precisely, if the
+.BR SCHED_RESET_ON_FORK
+flag is specified,
+the following rules apply for subsequently created children:
+.IP * 3
+If the calling thread has a scheduling policy of
+.B SCHED_FIFO
+or
+.BR SCHED_RR ,
+the policy is reset to
+.BR SCHED_OTHER
+in child processes.
+.IP *
+If the calling process has a negative nice value,
+the nice value is reset to zero in child processes.
+.PP
+After the
+.BR SCHED_RESET_ON_FORK
+flag has been enabled,
+it can be reset only if the thread has the
+.BR CAP_SYS_NICE
+capability.
+This flag is disabled in child processes created by
+.BR fork (2).
+
+The
+.B SCHED_RESET_ON_FORK
+flag is visible in the policy value returned by
+.BR sched_getscheduler ()
+.\"
+.SS Privileges and resource limits
+In Linux kernels before 2.6.12, only privileged
+.RB ( CAP_SYS_NICE )
+threads can set a nonzero static priority (i.e., set a real-time
+scheduling policy).
+The only change that an unprivileged thread can make is to set the
+.B SCHED_OTHER
+policy, and this can be done only if the effective user ID of the caller of
+.BR sched_setscheduler ()
+matches the real or effective user ID of the target thread
+(i.e., the thread specified by
+.IR pid )
+whose policy is being changed.
+
+Since Linux 2.6.12, the
+.B RLIMIT_RTPRIO
+resource limit defines a ceiling on an unprivileged thread's
+static priority for the
+.B SCHED_RR
+and
+.B SCHED_FIFO
+policies.
+The rules for changing scheduling policy and priority are as follows:
+.IP * 3
+If an unprivileged thread has a nonzero
+.B RLIMIT_RTPRIO
+soft limit, then it can change its scheduling policy and priority,
+subject to the restriction that the priority cannot be set to a
+value higher than the maximum of its current priority and its
+.B RLIMIT_RTPRIO
+soft limit.
+.IP *
+If the
+.B RLIMIT_RTPRIO
+soft limit is 0, then the only permitted changes are to lower the priority,
+or to switch to a non-real-time policy.
+.IP *
+Subject to the same rules,
+another unprivileged thread can also make these changes,
+as long as the effective user ID of the thread making the change
+matches the real or effective user ID of the target thread.
+.IP *
+Special rules apply for the
+.BR SCHED_IDLE .
+In Linux kernels before 2.6.39,
+an unprivileged thread operating under this policy cannot
+change its policy, regardless of the value of its
+.BR RLIMIT_RTPRIO
+resource limit.
+In Linux kernels since 2.6.39,
+.\" commit c02aa73b1d18e43cfd79c2f193b225e84ca497c8
+an unprivileged thread can switch to either the
+.BR SCHED_BATCH
+or the
+.BR SCHED_NORMAL
+policy so long as its nice value falls within the range permitted by its
+.BR RLIMIT_NICE
+resource limit (see
+.BR getrlimit (2)).
+.PP
+Privileged
+.RB ( CAP_SYS_NICE )
+threads ignore the
+.B RLIMIT_RTPRIO
+limit; as with older kernels,
+they can make arbitrary changes to scheduling policy and priority.
+See
+.BR getrlimit (2)
+for further information on
+.BR RLIMIT_RTPRIO .
+.SS Response time
+A blocked high priority thread waiting for the I/O has a certain
+response time before it is scheduled again.
+The device driver writer
+can greatly reduce this response time by using a "slow interrupt"
+interrupt handler.
+.\" as described in
+.\" .BR request_irq (9).
+.SS Miscellaneous
+Child processes inherit the scheduling policy and parameters across a
+.BR fork (2).
+The scheduling policy and parameters are preserved across
+.BR execve (2).
+
+Memory locking is usually needed for real-time processes to avoid
+paging delays; this can be done with
+.BR mlock (2)
+or
+.BR mlockall (2).
+
+Since a nonblocking infinite loop in a thread scheduled under
+\fBSCHED_FIFO\fP or \fBSCHED_RR\fP will block all threads with lower
+priority forever, a software developer should always keep available on
+the console a shell scheduled under a higher static priority than the
+tested application.
+This will allow an emergency kill of tested
+real-time applications that do not block or terminate as expected.
+See also the description of the
+.BR RLIMIT_RTTIME
+resource limit in
+.BR getrlimit (2).
+
+POSIX systems on which
+.BR sched_setscheduler ()
+and
+.BR sched_getscheduler ()
+are available define
+.B _POSIX_PRIORITY_SCHEDULING
+in \fI<unistd.h>\fP.
+.SH RETURN VALUE
+On success,
+.BR sched_setscheduler ()
+returns zero.
+On success,
+.BR sched_getscheduler ()
+returns the policy for the thread (a nonnegative integer).
+On error, \-1 is returned, and
+.I errno
+is set appropriately.
+.SH ERRORS
+.TP
+.B EINVAL
+The scheduling \fIpolicy\fP is not one of the recognized policies,
+\fIparam\fP is NULL,
+or \fIparam\fP does not make sense for the \fIpolicy\fP.
+.TP
+.B EPERM
+The calling thread does not have appropriate privileges.
+.TP
+.B ESRCH
+The thread whose ID is \fIpid\fP could not be found.
+.SH CONFORMING TO
+POSIX.1-2001 (but see BUGS below).
+The \fBSCHED_BATCH\fP and \fBSCHED_IDLE\fP policies are Linux-specific.
+.SH NOTES
+POSIX.1 does not detail the permissions that an unprivileged
+thread requires in order to call
+.BR sched_setscheduler (),
+and details vary across systems.
+For example, the Solaris 7 manual page says that
+the real or effective user ID of the caller must
+match the real user ID or the save set-user-ID of the target.
+.PP
+The scheduling policy and parameters are in fact per-thread
+attributes on Linux.
+The value returned from a call to
+.BR gettid (2)
+can be passed in the argument
+.IR pid .
+Specifying
+.I pid
+as 0 will operate on the attribute for the calling thread,
+and passing the value returned from a call to
+.BR getpid (2)
+will operate on the attribute for the main thread of the thread group.
+(If you are using the POSIX threads API, then use
+.BR pthread_setschedparam (3),
+.BR pthread_getschedparam (3),
+and
+.BR pthread_setschedprio (3),
+instead of the
+.BR sched_* (2)
+system calls.)
+.PP
+Originally, Standard Linux was intended as a general-purpose operating
+system being able to handle background processes, interactive
+applications, and less demanding real-time applications (applications that
+need to usually meet timing deadlines).
+Although the Linux kernel 2.6
+allowed for kernel preemption and the newly introduced O(1) scheduler
+ensures that the time needed to schedule is fixed and deterministic
+irrespective of the number of active tasks, true real-time computing
+was not possible up to kernel version 2.6.17.
+.SS Real-time features in the mainline Linux kernel
+.\" FIXME . Probably this text will need some minor tweaking
+.\" by about the time of 2.6.30; ask Carsten Emde about this then.
+From kernel version 2.6.18 onward, however, Linux is gradually
+becoming equipped with real-time capabilities,
+most of which are derived from the former
+.I realtime-preempt
+patches developed by Ingo Molnar, Thomas Gleixner,
+Steven Rostedt, and others.
+Until the patches have been completely merged into the
+mainline kernel
+(this is expected to be around kernel version 2.6.30),
+they must be installed to achieve the best real-time performance.
+These patches are named:
+.in +4n
+.nf
+
+patch-\fIkernelversion\fP-rt\fIpatchversion\fP
+.fi
+.in
+.PP
+and can be downloaded from
+.UR http://www.kernel.org\:/pub\:/linux\:/kernel\:/projects\:/rt/
+.UE .
+
+Without the patches and prior to their full inclusion into the mainline
+kernel, the kernel configuration offers only the three preemption classes
+.BR CONFIG_PREEMPT_NONE ,
+.BR CONFIG_PREEMPT_VOLUNTARY ,
+and
+.B CONFIG_PREEMPT_DESKTOP
+which respectively provide no, some, and considerable
+reduction of the worst-case scheduling latency.
+
+With the patches applied or after their full inclusion into the mainline
+kernel, the additional configuration item
+.B CONFIG_PREEMPT_RT
+becomes available.
+If this is selected, Linux is transformed into a regular
+real-time operating system.
+The FIFO and RR scheduling policies that can be selected using
+.BR sched_setscheduler ()
+are then used to run a thread
+with true real-time priority and a minimum worst-case scheduling latency.
+.SH BUGS
+POSIX says that on success,
+.SH SEE ALSO
+.ad l
+.nh
+.BR chrt (1),
+.BR getpriority (2),
+.BR mlock (2),
+.BR mlockall (2),
+.BR munlock (2),
+.BR munlockall (2),
+.BR nice (2),
+.BR sched_get_priority_max (2),
+.BR sched_get_priority_min (2),
+.BR sched_getaffinity (2),
+.BR sched_getparam (2),
+.BR sched_rr_get_interval (2),
+.BR sched_setaffinity (2),
+.BR sched_setparam (2),
+.BR sched_yield (2),
+.BR setpriority (2),
+.BR capabilities (7),
+.BR cpuset (7)
+.ad
+.PP
+.I Programming for the real world \- POSIX.4
+by Bill O. Gallmeister, O'Reilly & Associates, Inc., ISBN 1-56592-074-0.
+.PP
+The Linux kernel source file
+.I Documentation/scheduler/sched-rt-group.txt