12

Where can I find the documentation on the modifiers for gsub? \a \b \c \1 \2 \3 %a %b %c $1 $2 %3 etc.?

Specifically, I'm looking at this code... something.gsub(/%u/, unit) what's the %u?

5 Answers 5

11

First off, %u is nothing special in ruby regex:

mixonic@pandora ~ $ irb
irb(main):001:0> '%u'.gsub(/%u/,'heyhey')
=> "heyhey"

The definitive documentation for Ruby 1.8 regex is in the Ruby Doc Bundle:

Strings delimited by slashes are regular expressions. The characters right after latter slash denotes the option to the regular expression. Option i means that regular expression is case insensitive. Option i means that regular expression does expression substitution only once at the first time it evaluated. Option x means extended regular expression, which means whitespaces and commens are allowd in the expression. Option p denotes POSIX mode, in which newlines are treated as normal character (matches with dots).

The %r/STRING/ is the another form of the regular expression.

^
    beginning of a line or string 
$
    end of a line or string 
.
    any character except newline 
\w
    word character[0-9A-Za-z_] 
\W
    non-word character 
\s
    whitespace character[ \t\n\r\f] 
\S
    non-whitespace character 
\d
    digit, same as[0-9] 
\D
    non-digit 
\A
    beginning of a string 
\Z
    end of a string, or before newline at the end 
\z
    end of a string 
\b
    word boundary(outside[]only) 
\B
    non-word boundary 
\b
    backspace(0x08)(inside[]only) 
[ ]
    any single character of set 
*
    0 or more previous regular expression 
*?
    0 or more previous regular expression(non greedy) 
+
    1 or more previous regular expression 
+?
    1 or more previous regular expression(non greedy) 
{m,n}
    at least m but most n previous regular expression 
{m,n}?
    at least m but most n previous regular expression(non greedy) 
?
    0 or 1 previous regular expression 
|
    alternation 
( )
    grouping regular expressions 
(?# )
    comment 
(?: )
    grouping without backreferences 
(?= )
    zero-width positive look-ahead assertion 
(?! )
    zero-width negative look-ahead assertion 
(?ix-ix)
    turns on (or off) `i' and `x' options within regular expression.

These modifiers are localized inside an enclosing group (if any). (?ix-ix: ) turns on (or off) i' andx' options within this non-capturing group.

Backslash notation and expression substitution available in regular expressions.

Good luck!

Sign up to request clarification or add additional context in comments.

2 Comments

I'm looking at this code... .gsub(/%u/, unit) what's the %u? I can't seem to find %u in your docs, and The explanation for % is not clear to me. Thanks!
%u is nothing special- try the code in IRB: '%u'.gsub(/%u/,'heyhey') #=> "heyhey". gsub is finding a string '%u' and replacing it with the second argument, 'heyhey' in this example.
9

Zenspider's Quickref contains a section explaining which escape sequences can be used in regexen and one listing the pseudo variables that get set by a regexp match. In the second argument to gsub you simply write the name of the variable with a backslash instead of a $ and it will be replaced with the value of that variable after applying the regexp. If you use a double quoted string, you need to use two backslashes.

When using the block-form of gsub you can simply use the variables directly. If you return a string containing e.g. \1 from the block, that will not be replaced with $1. That only happens when using the two-argument form.

2 Comments

I'm looking at this code... .gsub(/%u/, unit) what's the %u? I can't seem to find %u in your docs, and The explanation for % is not clear to me. Thanks!
The %u is a percent sign followed by the letter u. It has no special meaning in regexen, so I assume that the text literally contains that sequence.
5

If you use block in sub/gsub you can access to the groups like that :

>> rx = /(ab(cd)ef)/
>> s = "-abcdef-abcdef"
>> s.gsub(rx) { $2 }
=> "cdgh-cdghi"

Comments

1

For Ruby 1.9's Oniguruma there is a good documentation of the regular expression here.

Comments

0

gsub is also a string substitution function within the LUA language.

Within the LUA regex language %u represents the Upper Case character class. i.e. It will match all upper case letters. Similarly %l will match lower case.

LUA Regex Class Patterns

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.