2

I am following the Elequent Javacript's website page on regular expressions and getting a little frustrated.

The page shows this example which swaps first and last names whilst removing commas:

var names = "Picasso, Pablo\nGauguin, Paul\nVan Gogh, Vincent\n";
document.writeln(names.replace(/([\w ]+), ([\w ]+)/g, "$2 $1"));

The author briefly explains what it does, but expends very little effort explaining why it works, and what the key features of this example are.

Can anyone help my fathom what ("$1 and $2) are and what they are referencing and why

2
  • 1
    Read on "regex capture groups" Commented Feb 24, 2014 at 2:21
  • Thanks, I wish the author would have said that. Commented Feb 24, 2014 at 2:56

1 Answer 1

2

$1 and $2 are referencing the first and second capturing group matches (the patterns between ( and )).

The given command will find matches for the regex /([\w ]+), ([\w ]+)/ and apply it on all lines (g option). For each matches, it will replace the string that matched with $2 $1, that is the second captured value followed by space and then the first captured value.

For more informations and a good starting point for regular expressions, you can head over to regular-expressions.info which is quite a complete reference.

Sign up to request clarification or add additional context in comments.

5 Comments

Could you tell me why the comma does not end up in the final result?
If you look carefully at the regex, it's composed of two capturing groups (that captures for a sequence of one or more alphanumeric caracters - \w - or spaces), and in between a , - comma followed by space - sequence, which is not captured. In your initial examples, for the first match, the first captured group will be "Picasso", the second will be "Pablo", and thus when computing $1 $2, the , sequence won't be anywhere to be seen.
Makes sense, so I gather that if a comma were required somewhere, it needs to be escaped?
Not sure I understood that question. The comma is needed in this case: without it the string won't mach; however the comma is not part of captured substrings, and deliberately omitted after replacement. If say you wanted to be able to match a string like word1, word2, word3 and consider word1, word2 to be a single sequence, and word3 another, it would be tricky (and would depend on what the global pattern might be); in that specific case it would probably give something like /([\w ]+, [\w ]+), ([\w ]+)/ (i.e. the first captured substring would be of the form word1, word2).
More generally, the first difficulty with regex is to identify precisely the pattern you want to match, and the construction of the substrings you want to capture: these are operations we do on a daily basis when looking for things in text, but which might be incredibly hard to theorize as a pattern. Then comes the actual regex syntax with all its mysteries and complexities.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.