2

var a = 'a\na'
console.log(a.match(/.*/g)) // ['a', '', 'a', '']

Why are there two empty strings in the result?

Let's say if there are empty strings, why isn't there one at beginning and at the end of each line as well, hence 4 empty strings?

I am not looking for how to select 'a's but just want to understand the presence of the empty strings.

8
  • consider that * means "zero or more of the preceding", which includes empty strings Commented Mar 13, 2018 at 10:52
  • @FedericoklezCulloca yes updated question a little bit Commented Mar 13, 2018 at 10:53
  • @user202729 where are tags in the page title? Commented Mar 13, 2018 at 10:58
  • 2
    Something like this. "the system automatically prepends the most commonly used tag to the question title when generating the page title (unless it's already in the question title somewhere)" Commented Mar 13, 2018 at 11:00
  • @user202729 Yes, I guess it might suggest a tag if you do not include one, but what is the big deal here? Commented Mar 13, 2018 at 11:01

4 Answers 4

1

Star operator * means there can be any number of ocurrences (even 0 ocurrences). With the expression used, an empty string can be a match. Not sure what are you looking for, but maybe a + operator (1 or more ocurrences) will be better?

Want to add some more info, regex use a greedy algorithm by default (in some languages you can override this behaviour), so it will pick as much of the text as it can. In this case, it will pick the a, because it can be processed with the regex, so the "\na" is still there. "\n" does not match the ".", so the only available option is the empty string. Then, we will process the next line, and again, we can match a "a". After this, only the empty string matches the regex.

Sign up to request clarification or add additional context in comments.

7 Comments

If this is the reason, then why does 'ab\na'.match(/.*/g) give ["ab", "", "a", ""]?
@Nit because of greedy matching.
@Nit If you know the API well enough, then consider answering (I don't).
@TimBiegeleisen Shouldn't it be "if you don't know the API well enough, then consider not answering"?
I believe that JS's match does not use dot in DOT ALL mode. When it hits a newline, that generates an empty match. Not sure about the empty match at the end though, maybe the $ anchor someone generates that.
|
1

* Matches the preceding expression 0 or more times.

. matches any single character except the newline character.

That is what official doc says about . and *. So i guess the array you received is something like this:

[ the first "any character" of the first line, following "nothing", the first "any character" of the second line, following "nothing"]

And the new-line character is just ignored

Comments

1

The best explanation I can offer for the following:

'ab\na'.match(/.*/g)
["ab", "", "a", ""]

Is that JavaScript's match function uses dot not in DOT ALL mode, meaning that dot does not match across newlines. When the .* pattern is applied to ab\na, it first matches ab, then stops at the newline. The newline generates an empty match. Then, a is matched, and then for some reason the end of the string matches another empty match.

If you just want to extract the non whitespace content from each line, then you may try the following:

print('ab\na'.match(/.+/g))
ab,a

6 Comments

@user202729 Maybe my brain thinks ahead too fast. I added a description per your request.
The newline generates an empty match this is not the reason. This empty match is the same kind of second empty match that has no following newline character.
@revo I should have perhaps said the pause at the newline. I think the newline in this case basically acts like a zero width delimiter.
My whole point is that it's not the newline as . default behavior doesn't have anything to do with newlines. It's the end of line which is a zero-length position that produces an empty string.
@revo OK...but the end of the line is only there because it is immediately proceeded by a newline character (or the $ at the very end).
|
1

Let's say if there are empty strings, why isn't there one at beginning and at the end...

.* applies greediness. It swallows a complete line asap. By a line I mean everything before a line break. When it encounters end of a line, it matches again due to star quantifier.

If you want 4 you may add ? to star quantifier and make it lazy .*? but yet this regex has different result in different flavors because of the way they handle zero-length matches.

You can try .*? with both PCRE and JS engines in regex101 and see the differences.

Question:

You may ask why does engine try to find a match at the end of line while whole thing is already matched?

Answer:

It's for the reason that we have a definition for end of lines and end of strings. So not whole thing is matched. There is a left position that has a chance to be matched and we have it with star quantifier.

This left position is end of line here which is a true match for $ when m flag is on. A . doesn't match this position but a .* or .*? match because they would be a pattern for zero-length positions too as any X-STAR patterns like \d*, \D*, a* or b?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.