1

I'm trying to learn Regular Expressions and at the moment I've gathered a very basic understanding from all of the overviews from W3, Mozilla or http://www.regular-expressions.info/, but when I was exploring this wikibook http://en.wikibooks.org/wiki/JavaScript/Regular_Expressions it gave this example:

"abbc".replace(/(.)\1/g, "$1") => "abc"

which I have no idea why is true (the wikibook didn't really explain), but I tried it myself and it does drop the second b. I know \1 is a backreference to the captured group (.), but . is the any character besides a new line symbol... Wouldn't that still pick up the second b? Trying a few variations didn't clear things up either...

"abbc".replace(/(.)/g, "$1") => "abbc"
"aabc".replace(/(.)*/g, "$1") => "c"

Does anybody have a good in depth tutorial on Javascript Regular Expressions (I've looked at a couple of books and they're very generalized for about 15 languages and no real emphasis on Javascript).

0

2 Answers 2

8

First One

  • (.) matches and captures a single character to Group 1, so (.)\1 matches two of the same characters, for instance AA.
  • In the string, the only match for this pattern is bb.
  • By replacing these two characters bb by the Group 1 capture buffer $1, i.e. b, we replace two chars with one, effectively removing oneb`.

Second One

  • Again (.) matches and captures a single character, capturing it to Group 1.
  • The pattern matches each character in the string in turn.
  • The replacement is the Group 1 capture buffer $1, so we replace each character with itself. Therefore the string is unchanged.

Third One

  • Here, forgetting the parentheses for a moment, .* matches the whole string: this is the match.
  • The quantifier * means that the Group 1 is reset every time a single character is matched (new group numbers are not created, as group numbering is done from left to right).
  • For every character that is matched, that character is therefore captured to Group 1—until the next capture resets Group 1.
  • The end value of Group 1 is the the last capture, which is the last character c
  • We replace the match (i.e., the whole string) with Group 1 (i.e. c), so the replacement string is c.

The details of group numbering are important to grasp, and I highly recommend you read the linked article about "the gory details".

Reference

Sign up to request clarification or add additional context in comments.

7 Comments

To be precise, in the third one (.)* matches each single character in turn and so every character gets to be $1 for a short moment.
@Jongware Yes, that's right. I added a great article about group numbering.
FYI you asked about other resources, and two of the linked articles come from a site that should complement what you've already seen.
Thank you so much, this in-depth explanation really helps. For some reason I thought the second expression, in order to give the output that it does, would need to be written as "abbc".replace(/(.*)/g, "$1") => "abbc", the way it was written, it seemed like it should have returned only "a", not the whole string...
+1 for so much details in the answer. @user3334776: Can you mark the answer as accepted by clicking on tick mark on top-left of my answer.
|
0

This is quite simple when broken down:

With "abbc".replace(/(.)\1/g, "$1"), the result is "abc" because:
(.) references one character.
\1 references the first back reference

So what it says is "find 2 times the same letter" and replace it with the reference. So any doubled character would match and be replaced by the reference.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.