Ruby gsub / regex modifiers?

Question

Where can I find the documentation on the modifiers for gsub? \a \b \c \1 \2 \3 %a %b %c $1 $2 %3 etc.?

Specifically, I'm looking at this code... something.gsub(/%u/, unit) what's the %u?

mixonic · Accepted Answer · 2009-08-06 12:51:56Z

11

First off, %u is nothing special in ruby regex:

mixonic@pandora ~ $ irb
irb(main):001:0> '%u'.gsub(/%u/,'heyhey')
=> "heyhey"

The definitive documentation for Ruby 1.8 regex is in the Ruby Doc Bundle:

http://ruby-doc.org/docs/ruby-doc-bundle/Manual/man-1.4/syntax.html#regexp

Strings delimited by slashes are regular expressions. The characters right after latter slash denotes the option to the regular expression. Option i means that regular expression is case insensitive. Option i means that regular expression does expression substitution only once at the first time it evaluated. Option x means extended regular expression, which means whitespaces and commens are allowd in the expression. Option p denotes POSIX mode, in which newlines are treated as normal character (matches with dots).

The %r/STRING/ is the another form of the regular expression.
^
    beginning of a line or string 
$
    end of a line or string 
.
    any character except newline 
\w
    word character[0-9A-Za-z_] 
\W
    non-word character 
\s
    whitespace character[ \t\n\r\f] 
\S
    non-whitespace character 
\d
    digit, same as[0-9] 
\D
    non-digit 
\A
    beginning of a string 
\Z
    end of a string, or before newline at the end 
\z
    end of a string 
\b
    word boundary(outside[]only) 
\B
    non-word boundary 
\b
    backspace(0x08)(inside[]only) 
[ ]
    any single character of set 
*
    0 or more previous regular expression 
*?
    0 or more previous regular expression(non greedy) 
+
    1 or more previous regular expression 
+?
    1 or more previous regular expression(non greedy) 
{m,n}
    at least m but most n previous regular expression 
{m,n}?
    at least m but most n previous regular expression(non greedy) 
?
    0 or 1 previous regular expression 
|
    alternation 
( )
    grouping regular expressions 
(?# )
    comment 
(?: )
    grouping without backreferences 
(?= )
    zero-width positive look-ahead assertion 
(?! )
    zero-width negative look-ahead assertion 
(?ix-ix)
    turns on (or off) `i' and `x' options within regular expression.
These modifiers are localized inside an enclosing group (if any). (?ix-ix: ) turns on (or off) i' andx' options within this non-capturing group.

Backslash notation and expression substitution available in regular expressions.

Good luck!

edited Aug 6, 2009 at 12:51

answered Aug 5, 2009 at 18:06

mixonic

2,7011 gold badge18 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Blaine Over a year ago

I'm looking at this code... .gsub(/%u/, unit) what's the %u? I can't seem to find %u in your docs, and The explanation for % is not clear to me. Thanks!

mixonic Over a year ago

%u is nothing special- try the code in IRB: '%u'.gsub(/%u/,'heyhey') #=> "heyhey". gsub is finding a string '%u' and replacing it with the second argument, 'heyhey' in this example.

sepp2k · Accepted Answer · 2009-08-05 18:03:40Z

9

Zenspider's Quickref contains a section explaining which escape sequences can be used in regexen and one listing the pseudo variables that get set by a regexp match. In the second argument to gsub you simply write the name of the variable with a backslash instead of a $ and it will be replaced with the value of that variable after applying the regexp. If you use a double quoted string, you need to use two backslashes.

When using the block-form of gsub you can simply use the variables directly. If you return a string containing e.g. \1 from the block, that will not be replaced with $1. That only happens when using the two-argument form.

answered Aug 5, 2009 at 18:03

sepp2k

372k56 gold badges687 silver badges687 bronze badges

2 Comments

Blaine Over a year ago

I'm looking at this code... .gsub(/%u/, unit) what's the %u? I can't seem to find %u in your docs, and The explanation for % is not clear to me. Thanks!

sepp2k Over a year ago

The %u is a percent sign followed by the letter u. It has no special meaning in regexen, so I assume that the text literally contains that sequence.

Stan · Accepted Answer · 2009-12-02 09:01:04Z

5

If you use block in sub/gsub you can access to the groups like that :

>> rx = /(ab(cd)ef)/
>> s = "-abcdef-abcdef"
>> s.gsub(rx) { $2 }
=> "cdgh-cdghi"

answered Dec 2, 2009 at 9:01

Stan

9,0162 gold badges31 silver badges31 bronze badges

Comments

Robert Klemme · Accepted Answer · 2009-08-05 22:17:50Z

1

For Ruby 1.9's Oniguruma there is a good documentation of the regular expression here.

answered Aug 5, 2009 at 22:17

Robert Klemme

2,18918 silver badges23 bronze badges

Comments

SimplSam · Accepted Answer · 2014-03-31 07:12:20Z

0

gsub is also a string substitution function within the LUA language.

Within the LUA regex language %u represents the Upper Case character class. i.e. It will match all upper case letters. Similarly %l will match lower case.

LUA Regex Class Patterns

answered Mar 31, 2014 at 7:12

SimplSam

1011 silver badge1 bronze badge

Collectives™ on Stack Overflow

Ruby gsub / regex modifiers?

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related