REGEX replace function in JavaScript Swapping Words

Question

I am following the Elequent Javacript's website page on regular expressions and getting a little frustrated.

The page shows this example which swaps first and last names whilst removing commas:

var names = "Picasso, Pablo\nGauguin, Paul\nVan Gogh, Vincent\n";
document.writeln(names.replace(/([\w ]+), ([\w ]+)/g, "$2 $1"));

The author briefly explains what it does, but expends very little effort explaining why it works, and what the key features of this example are.

Can anyone help my fathom what ("$1 and $2) are and what they are referencing and why

Read on "regex capture groups"

elclanrs
– elclanrs

2014-02-24 02:21:01 +00:00
Commented Feb 24, 2014 at 2:21 — elclanrs
– elclanrs, Commented Feb 24, 2014 at 2:21
Thanks, I wish the author would have said that.

Andrew S
– Andrew S

2014-02-24 02:56:55 +00:00
Commented Feb 24, 2014 at 2:56 — Andrew S
– Andrew S, Commented Feb 24, 2014 at 2:56

Gorkk · Accepted Answer · 2014-02-24 02:23:54Z

2

$1 and $2 are referencing the first and second capturing group matches (the patterns between ( and )).

The given command will find matches for the regex /([\w ]+), ([\w ]+)/ and apply it on all lines (g option). For each matches, it will replace the string that matched with $2 $1, that is the second captured value followed by space and then the first captured value.

For more informations and a good starting point for regular expressions, you can head over to regular-expressions.info which is quite a complete reference.

answered Feb 24, 2014 at 2:23

Gorkk

1,05611 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Andrew S Over a year ago

Could you tell me why the comma does not end up in the final result?

Gorkk Over a year ago

If you look carefully at the regex, it's composed of two capturing groups (that captures for a sequence of one or more alphanumeric caracters - \w - or spaces), and in between a , - comma followed by space - sequence, which is not captured. In your initial examples, for the first match, the first captured group will be "Picasso", the second will be "Pablo", and thus when computing $1 $2, the , sequence won't be anywhere to be seen.

Andrew S Over a year ago

Makes sense, so I gather that if a comma were required somewhere, it needs to be escaped?

Gorkk Over a year ago

Not sure I understood that question. The comma is needed in this case: without it the string won't mach; however the comma is not part of captured substrings, and deliberately omitted after replacement. If say you wanted to be able to match a string like word1, word2, word3 and consider word1, word2 to be a single sequence, and word3 another, it would be tricky (and would depend on what the global pattern might be); in that specific case it would probably give something like /([\w ]+, [\w ]+), ([\w ]+)/ (i.e. the first captured substring would be of the form word1, word2).

Gorkk Over a year ago

More generally, the first difficulty with regex is to identify precisely the pattern you want to match, and the construction of the substrings you want to capture: these are operations we do on a daily basis when looking for things in text, but which might be incredibly hard to theorize as a pattern. Then comes the actual regex syntax with all its mysteries and complexities.

Collectives™ on Stack Overflow

REGEX replace function in JavaScript Swapping Words

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related