diff options
| author | Michael Kerrisk <mtk.manpages@gmail.com> | 2008-06-09 21:03:52 +0000 |
|---|---|---|
| committer | Michael Kerrisk <mtk.manpages@gmail.com> | 2008-06-09 21:03:52 +0000 |
| commit | 333a424b0ed691be51cc82a781822f8ae8b6fe16 (patch) | |
| tree | 66134ce751230c21ec001822ed927c5ab1e1a688 /man7/regex.7 | |
| parent | ce17c5d39108098ca881c499d6cef1138555001b (diff) | |
| download | man-pages-333a424b0ed691be51cc82a781822f8ae8b6fe16.tar.gz | |
Try and bring some consistency to quotes.
Diffstat (limited to 'man7/regex.7')
| -rw-r--r-- | man7/regex.7 | 126 |
1 files changed, 63 insertions, 63 deletions
diff --git a/man7/regex.7 b/man7/regex.7 index 644a0c8a93..440cbde92b 100644 --- a/man7/regex.7 +++ b/man7/regex.7 @@ -47,26 +47,26 @@ POSIX.2 "basic" REs). Obsolete REs mostly exist for backward compatibility in some old programs; they will be discussed at the end. POSIX.2 leaves some aspects of RE syntax and semantics open; -`\*(dg' marks decisions on these aspects that +"\*(dg" marks decisions on these aspects that may not be fully portable to other POSIX.2 implementations. .PP A (modern) RE is one\*(dg or more non-empty\*(dg \fIbranches\fR, -separated by `|'. +separated by \(aq|\(aq. It matches anything that matches one of the branches. .PP A branch is one\*(dg or more \fIpieces\fR, concatenated. It matches a match for the first, followed by a match for the second, etc. .PP A piece is an \fIatom\fR possibly followed -by a single\*(dg `*', `+', `?', or \fIbound\fR. -An atom followed by `*' matches a sequence of 0 or more matches of the atom. -An atom followed by `+' matches a sequence of 1 or more matches of the atom. -An atom followed by `?' matches a sequence of 0 or 1 matches of the atom. +by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR. +An atom followed by \(aq*\(aq matches a sequence of 0 or more matches of the atom. +An atom followed by \(aq+\(aq matches a sequence of 1 or more matches of the atom. +An atom followed by \(aq?\(aq matches a sequence of 0 or 1 matches of the atom. .PP -A \fIbound\fR is `{' followed by an unsigned decimal integer, -possibly followed by `,' +A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer, +possibly followed by \(aq,\(aq possibly followed by another unsigned decimal integer, -always followed by `}'. +always followed by \(aq}\(aq. The integers must lie between 0 and .B RE_DUP_MAX (255\*(dg) inclusive, @@ -81,71 +81,71 @@ An atom followed by a bound containing two integers \fIi\fR and \fIj\fR matches a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom. .PP -An atom is a regular expression enclosed in `()' (matching a match for the +An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the regular expression), -an empty set of `()' (matching the null string)\*(dg, -a \fIbracket expression\fR (see below), `.' -(matching any single character), `^' (matching the null string at the -beginning of a line), `$' (matching the null string at the -end of a line), a `\e' followed by one of the characters -`^.[$()|*+?{\e' +an empty set of "\fI()\fP" (matching the null string)\*(dg, +a \fIbracket expression\fR (see below), \(aq.\(aq +(matching any single character), \(aq^\(aq (matching the null string at the +beginning of a line), \(aq$\(aq (matching the null string at the +end of a line), a \(aq\e\(aq followed by one of the characters +"\fI^.[$()|*+?{\e\fP" (matching that character taken as an ordinary character), -a `\e' followed by any other character\*(dg +a \(aq\e\(aq followed by any other character\*(dg (matching that character taken as an ordinary character, -as if the `\e' had not been present\*(dg), +as if the \(aq\e\(aq had not been present\*(dg), or a single character with no other significance (matching that character). -A `{' followed by a character other than a digit is an ordinary +A \(aq{\(aq followed by a character other than a digit is an ordinary character, not the beginning of a bound\*(dg. -It is illegal to end an RE with `\e'. +It is illegal to end an RE with \(aq\e\(aq. .PP -A \fIbracket expression\fR is a list of characters enclosed in `[]'. +A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP". It normally matches any single character from the list (but see below). -If the list begins with `^', +If the list begins with \(aq^\(aq, it matches any single character (but see below) \fInot\fR from the rest of the list. -If two characters in the list are separated by `\-', this is shorthand +If two characters in the list are separated by \(aq\-\(aq, this is shorthand for the full \fIrange\fR of characters between those two (inclusive) in the collating sequence, -for example, `[0\-9]' in ASCII matches any decimal digit. +for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit. It is illegal\*(dg for two ranges to share an -endpoint, for example, `a-c-e'. +endpoint, for example, "\fIa-c-e\fP". Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. .PP -To include a literal `]' in the list, make it the first character -(following a possible `^'). -To include a literal `\-', make it the first or last character, +To include a literal \(aq]\(aq in the list, make it the first character +(following a possible \(aq^\(aq). +To include a literal \(aq\-\(aq, make it the first or last character, or the second endpoint of a range. -To use a literal `\-' as the first endpoint of a range, -enclose it in `[.' and `.]' to make it a collating element (see below). -With the exception of these and some combinations using `[' (see next -paragraphs), all other special characters, including `\e', lose their +To use a literal \(aq\-\(aq as the first endpoint of a range, +enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below). +With the exception of these and some combinations using \(aq[\(aq (see next +paragraphs), all other special characters, including \(aq\e\(aq, lose their special significance within a bracket expression. .PP Within a bracket expression, a collating element (a character, a multi-character sequence that collates as if it were a single character, or a collating-sequence name for either) -enclosed in `[.' and `.]' stands for the +enclosed in "\fI[.\fP" and "\fI.]\fP" stands for the sequence of characters of that collating element. The sequence is a single element of the bracket expression's list. A bracket expression containing a multi-character collating element can thus match more than one character, -for example, if the collating sequence includes a `ch' collating element, -then the RE `[[.ch.]]*c' matches the first five characters -of `chchcc'. +for example, if the collating sequence includes a "ch" collating element, +then the RE "\fI[[.ch.]]*c\fP" matches the first five characters +of "chchcc". .PP -Within a bracket expression, a collating element enclosed in `[=' and -`=]' is an equivalence class, standing for the sequences of characters +Within a bracket expression, a collating element enclosed in "\fI[=\fP" and +"\fI=]\fP" is an equivalence class, standing for the sequences of characters of all collating elements equivalent to that one, including itself. (If there are no other equivalent collating elements, -the treatment is as if the enclosing delimiters were `[.' and `.]'.) +the treatment is as if the enclosing delimiters were "\fI[.\fP" and "\fI.]\fP".) For example, if o and \o'o^' are the members of an equivalence class, -then `[[=o=]]', `[[=\o'o^'=]]', and `[o\o'o^']' are all synonymous. +then "\fI[[=o=]]\fP", "\fI[[=\o'o^'=]]\fP", and "\fI[o\o'o^']\fP" are all synonymous. An equivalence class may not\*(dg be an endpoint of a range. .PP Within a bracket expression, the name of a \fIcharacter class\fR enclosed -in `[:' and `:]' stands for the list of all characters belonging to that +in "\fI[:\fP" and "\fI:]\fP" stands for the list of all characters belonging to that class. Standard character class names are: .PP @@ -167,7 +167,7 @@ A character class may not be used as an endpoint of a range. .\" The following does not seem to apply in the glibc implementation .\" .PP .\" There are two special cases\*(dg of bracket expressions: -.\" the bracket expressions `[[:<:]]' and `[[:>:]]' match the null string at +.\" the bracket expressions "\fI[[:<:]]\fP" and "\fI[[:>:]]\fP" match the null string at .\" the beginning and end of a word respectively. .\" A word is defined as a sequence of .\" word characters @@ -198,11 +198,11 @@ their lower-level component subexpressions. Match lengths are measured in characters, not collating elements. A null string is considered longer than no match at all. For example, -`bb*' matches the three middle characters of `abbbc', -`(wee|week)(knights|nights)' matches all ten characters of `weeknights', -when `(.*).*' is matched against `abc' the parenthesized subexpression +"\fIbb*\fP" matches the three middle characters of "abbbc", +"\fI(wee|week)(knights|nights)\fP" matches all ten characters of "weeknights", +when "\fI(.*).*\fP" is matched against "abc" the parenthesized subexpression matches all three characters, and -when `(a*)*' is matched against `bc' both the whole RE and the parenthesized +when "\fI(a*)*\fP" is matched against "bc" both the whole RE and the parenthesized subexpression match the null string. .PP If case-independent matching is specified, @@ -211,10 +211,10 @@ alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, -for example, `x' becomes `[xX]'. +for example, \(aqx\(aq becomes "\fI[xX]\fP". When it appears inside a bracket expression, all case counterparts -of it are added to the bracket expression, so that, for example, `[x]' -becomes `[xX]' and `[^x]' becomes `[^xX]'. +of it are added to the bracket expression, so that, for example, "\fI[x]\fP" +becomes "\fI[xX]\fP" and "\fI[^x]\fP" becomes "\fI[^xX]\fP". .PP No particular limit is imposed on the length of REs\*(dg. Programs intended to be portable should not employ REs longer @@ -223,32 +223,32 @@ as an implementation can refuse to accept such REs and remain POSIX-compliant. .PP Obsolete ("basic") regular expressions differ in several respects. -`|', `+', and `?' are ordinary characters and there is no equivalent +\(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are ordinary characters and there is no equivalent for their functionality. -The delimiters for bounds are `\e{' and `\e}', -with `{' and `}' by themselves ordinary characters. -The parentheses for nested subexpressions are `\e(' and `\e)', -with `(' and `)' by themselves ordinary characters. -`^' is an ordinary character except at the beginning of the +The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP", +with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters. +The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP", +with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters. +\(aq^\(aq is an ordinary character except at the beginning of the RE or\*(dg the beginning of a parenthesized subexpression, -`$' is an ordinary character except at the end of the +\(aq$\(aq is an ordinary character except at the end of the RE or\*(dg the end of a parenthesized subexpression, -and `*' is an ordinary character if it appears at the beginning of the +and \(aq*\(aq is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression -(after a possible leading `^'). +(after a possible leading \(aq^\(aq). .PP Finally, there is one new type of atom, a \fIback reference\fR: -`\e' followed by a non-zero decimal digit \fId\fR +\(aq\e\(aq followed by a non-zero decimal digit \fId\fR matches the same sequence of characters matched by the \fId\fRth parenthesized subexpression (numbering subexpressions by the positions of their opening parentheses, left to right), -so that, for example, `\e([bc]\e)\e1' matches `bb' or `cc' but not `bc'. +so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc". .SH BUGS Having two kinds of REs is a botch. .PP -The current POSIX.2 spec says that `)' is an ordinary character in -the absence of an unmatched `('; +The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in +the absence of an unmatched \(aq(\(aq; this was an unintentional result of a wording error, and change is likely. Avoid relying on it. @@ -257,7 +257,7 @@ Back references are a dreadful botch, posing major problems for efficient implementations. They are also somewhat vaguely defined (does -`a\e(\e(b\e)*\e2\e)*d' match `abbbd'?). +"\fIa\e(\e(b\e)*\e2\e)*d\fP" match "abbbd"?). Avoid using them. .PP POSIX.2's specification of case-independent matching is vague. |
