4

Let's have the following (a bit complex) regular expression in JavaScript:

\{\{\s*(?:(?:\:)([\w\$]+))?\#(?:([\w\$\/]+@?)?([\s\S]*?))?(\.([\w\$\/]*))?\s*\}\}

I am wondering why it matches the whole string here:

{{:control#}}x{{*>*}}

but not in the following case (where a space is added after #):

{{:control# }}x{{*>*}}

In PHP or Python, it matches in both cases just the first part {{: ... }}.

I want JavaScript to match only the first part as well. Is it possible without hacking (?!}}) before [\s\S]?

Moreover, is performance the reason for this different behavior in JavaScript, or is it just a bug in specification?

0

2 Answers 2

3

You can use a lazy ?? quantifier to achieve the same behavior in JavaScript:

\{\{\s*(?:(?::)([\w$]+))?#(?:([\w$\/]+@?)?([\s\S]*?))??(\.([\w$\/]*))?\s*}}
                                                     ^^  

See demo

From rexegg.com:

A??     Zero or one A, zero if that still allows the overall pattern to match (lazy)

This is no bug, and is right according to the ECMA standard specifications JavaScript abides by.

Here, in (?:([\w$\/]+@?)?([\s\S]*?))?, we have an optional non-capturing group that can match an empty text. JavaScript regex engine "consumes" empty texts in optional groups for them to be later accessible via backreferences. This problem is closely connected with the Backreferences to Failed Groups. E.g. ((q)?b\2) will match b in JavaScript, but it won't match in Python and PCRE.

According to the official ECMA standard, a backreference to a non-participating capturing group must successfully match nothing just like a backreference to a participating group that captured nothing does.

Sign up to request clarification or add additional context in comments.

2 Comments

Why it works in PHP, Python or .NET without the ???
No, ? is a greedy quantifier in all languages, I added more details to the answer.
0

This subpattern is responsible for the behaviour:

([\w\$\/]+@?)?  // P1

as it matches greedily, your whole test string (without the space) gets consumed.

As @stribizhev suggests, qualifying the designated part of your regex for non-greedy matching, results in a conservative match.

Both versions will match up to and including #, since both match patterns contain this character without any occurrence restrictions.

The second test string (including the space after #) matches non-greedily, since the P1 does not match white-space. Instead this white-space gets matcehd by the subsequent subexpression ( [\s\S]*? ), thus finishing the match.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.