diff options
Diffstat (limited to 'man7/regex.7')
| -rw-r--r-- | man7/regex.7 | 70 |
1 files changed, 35 insertions, 35 deletions
diff --git a/man7/regex.7 b/man7/regex.7 index 4c130954e1..f313f7e024 100644 --- a/man7/regex.7 +++ b/man7/regex.7 @@ -54,7 +54,7 @@ POSIX.2 leaves some aspects of RE syntax and semantics open; may not be fully portable to other POSIX.2 implementations. .PP A (modern) RE is one\*(dg or more nonempty\*(dg \fIbranches\fR, -separated by \(aq|\(aq. +separated by \[aq]|\[aq]. It matches anything that matches one of the branches. .PP A branch is one\*(dg or more \fIpieces\fR, concatenated. @@ -62,18 +62,18 @@ It matches a match for the first, followed by a match for the second, and so on. .PP A piece is an \fIatom\fR possibly followed -by a single\*(dg \(aq*\(aq, \(aq+\(aq, \(aq?\(aq, or \fIbound\fR. -An atom followed by \(aq*\(aq +by a single\*(dg \[aq]*\[aq], \[aq]+\[aq], \[aq]?\[aq], or \fIbound\fR. +An atom followed by \[aq]*\[aq] matches a sequence of 0 or more matches of the atom. -An atom followed by \(aq+\(aq +An atom followed by \[aq]+\[aq] matches a sequence of 1 or more matches of the atom. -An atom followed by \(aq?\(aq +An atom followed by \[aq]?\[aq] matches a sequence of 0 or 1 matches of the atom. .PP -A \fIbound\fR is \(aq{\(aq followed by an unsigned decimal integer, -possibly followed by \(aq,\(aq +A \fIbound\fR is \[aq]{\[aq] followed by an unsigned decimal integer, +possibly followed by \[aq],\[aq] possibly followed by another unsigned decimal integer, -always followed by \(aq}\(aq. +always followed by \[aq]}\[aq]. The integers must lie between 0 and .B RE_DUP_MAX (255\*(dg) inclusive, @@ -91,26 +91,26 @@ a sequence of \fIi\fR through \fIj\fR (inclusive) matches of the atom. An atom is a regular expression enclosed in "\fI()\fP" (matching a match for the regular expression), an empty set of "\fI()\fP" (matching the null string)\*(dg, -a \fIbracket expression\fR (see below), \(aq.\(aq -(matching any single character), \(aq\(ha\(aq (matching the null string at the -beginning of a line), \(aq$\(aq (matching the null string at the -end of a line), a \(aq\e\(aq followed by one of the characters +a \fIbracket expression\fR (see below), \[aq].\[aq] +(matching any single character), \[aq]\(ha\[aq] (matching the null string at the +beginning of a line), \[aq]$\[aq] (matching the null string at the +end of a line), a \[aq]\e\[aq] followed by one of the characters "\fI\(ha.[$()|*+?{\e\fP" (matching that character taken as an ordinary character), -a \(aq\e\(aq followed by any other character\*(dg +a \[aq]\e\[aq] followed by any other character\*(dg (matching that character taken as an ordinary character, -as if the \(aq\e\(aq had not been present\*(dg), +as if the \[aq]\e\[aq] had not been present\*(dg), or a single character with no other significance (matching that character). -A \(aq{\(aq followed by a character other than a digit is an ordinary +A \[aq]{\[aq] followed by a character other than a digit is an ordinary character, not the beginning of a bound\*(dg. -It is illegal to end an RE with \(aq\e\(aq. +It is illegal to end an RE with \[aq]\e\[aq]. .PP A \fIbracket expression\fR is a list of characters enclosed in "\fI[]\fP". It normally matches any single character from the list (but see below). -If the list begins with \(aq\(ha\(aq, +If the list begins with \[aq]\(ha\[aq], it matches any single character (but see below) \fInot\fR from the rest of the list. -If two characters in the list are separated by \(aq\-\(aq, this is shorthand +If two characters in the list are separated by \[aq]\-\[aq], this is shorthand for the full \fIrange\fR of characters between those two (inclusive) in the collating sequence, for example, "\fI[0\-9]\fP" in ASCII matches any decimal digit. @@ -119,15 +119,15 @@ endpoint, for example, "\fIa\-c\-e\fP". Ranges are very collating-sequence-dependent, and portable programs should avoid relying on them. .PP -To include a literal \(aq]\(aq in the list, make it the first character -(following a possible \(aq\(ha\(aq). -To include a literal \(aq\-\(aq, make it the first or last character, +To include a literal \[aq]]\[aq] in the list, make it the first character +(following a possible \[aq]\(ha\[aq]). +To include a literal \[aq]\-\[aq], make it the first or last character, or the second endpoint of a range. -To use a literal \(aq\-\(aq as the first endpoint of a range, +To use a literal \[aq]\-\[aq] as the first endpoint of a range, enclose it in "\fI[.\fP" and "\fI.]\fP" to make it a collating element (see below). -With the exception of these and some combinations using \(aq[\(aq (see next -paragraphs), all other special characters, including \(aq\e\(aq, lose their +With the exception of these and some combinations using \[aq][\[aq] (see next +paragraphs), all other special characters, including \[aq]\e\[aq], lose their special significance within a bracket expression. .PP Within a bracket expression, a collating element (a character, @@ -224,7 +224,7 @@ alphabet. When an alphabetic that exists in multiple cases appears as an ordinary character outside a bracket expression, it is effectively transformed into a bracket expression containing both cases, -for example, \(aqx\(aq becomes "\fI[xX]\fP". +for example, \[aq]x\[aq] becomes "\fI[xX]\fP". When it appears inside a bracket expression, all case counterparts of it are added to the bracket expression, so that, for example, "\fI[x]\fP" becomes "\fI[xX]\fP" and "\fI[\(hax]\fP" becomes "\fI[\(haxX]\fP". @@ -236,23 +236,23 @@ as an implementation can refuse to accept such REs and remain POSIX-compliant. .PP Obsolete ("basic") regular expressions differ in several respects. -\(aq|\(aq, \(aq+\(aq, and \(aq?\(aq are +\[aq]|\[aq], \[aq]+\[aq], and \[aq]?\[aq] are ordinary characters and there is no equivalent for their functionality. The delimiters for bounds are "\fI\e{\fP" and "\fI\e}\fP", -with \(aq{\(aq and \(aq}\(aq by themselves ordinary characters. +with \[aq]{\[aq] and \[aq]}\[aq] by themselves ordinary characters. The parentheses for nested subexpressions are "\fI\e(\fP" and "\fI\e)\fP", -with \(aq(\(aq and \(aq)\(aq by themselves ordinary characters. -\(aq\(ha\(aq is an ordinary character except at the beginning of the +with \[aq](\[aq] and \[aq])\[aq] by themselves ordinary characters. +\[aq]\(ha\[aq] is an ordinary character except at the beginning of the RE or\*(dg the beginning of a parenthesized subexpression, -\(aq$\(aq is an ordinary character except at the end of the +\[aq]$\[aq] is an ordinary character except at the end of the RE or\*(dg the end of a parenthesized subexpression, -and \(aq*\(aq is an ordinary character if it appears at the beginning of the +and \[aq]*\[aq] is an ordinary character if it appears at the beginning of the RE or the beginning of a parenthesized subexpression -(after a possible leading \(aq\(ha\(aq). +(after a possible leading \[aq]\(ha\[aq]). .PP Finally, there is one new type of atom, a \fIback reference\fR: -\(aq\e\(aq followed by a nonzero decimal digit \fId\fR +\[aq]\e\[aq] followed by a nonzero decimal digit \fId\fR matches the same sequence of characters matched by the \fId\fRth parenthesized subexpression (numbering subexpressions by the positions of their opening parentheses, @@ -261,8 +261,8 @@ so that, for example, "\fI\e([bc]\e)\e1\fP" matches "bb" or "cc" but not "bc". .SH BUGS Having two kinds of REs is a botch. .PP -The current POSIX.2 spec says that \(aq)\(aq is an ordinary character in -the absence of an unmatched \(aq(\(aq; +The current POSIX.2 spec says that \[aq])\[aq] is an ordinary character in +the absence of an unmatched \[aq](\[aq]; this was an unintentional result of a wording error, and change is likely. Avoid relying on it. |
