19

So I know in JavaScript an implementation is allowed to extend the regular expressions grammar:

An implementation may extend the ECMAScript Regular Expression grammar defined in 21.2.1, but it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions.

Was this ability ever used? Do any existing JavaScript implementations extend the regular expressions grammar?

4
  • 1
    Looking for concrete examples of current extensions What else do you want to know on top of whatever already mentioned in the existing answers? Commented Jun 30, 2015 at 10:54
  • 1
    @nhahtdh what I'm asking: Whether implementations actually take advantage of this feature and how (current answers speak of old versions doing this). What I want to know: If we can safely drop this allowance from the ECMAScript specification now that we have subclassable RegExp. Commented Jun 30, 2015 at 11:08
  • 1
    I think you might want to clarify that with an edit to your question. Also I'm not sure what you are asking when you say If we can safely drop this allowance from the ECMAScript specification now that we have subclassable RegExp. The clause allows for browser implementation to extend the grammar of RegExp (not just by script) and limits such implementation so that it won't cause a confusion with comment syntax, so I guess it should not be dropped. Commented Jun 30, 2015 at 11:22
  • 1
    @nhahtdh Yes, I realize that browsers currently can do this, but if non actually do this we can drop this allowance in a future version of the ECMASCript spec which would make our lives a lot easier :D Commented Jun 30, 2015 at 11:23

2 Answers 2

9

Yes, Mozilla's Gecko engine did support the sticky y flag, which was not part of ES5. It did eventually became part of ES6.

This ability may be utilised again when engines start implementing look-behind (I hope they start experimenting before it will get specced).

This is not an exhaustive list, just what first came to my mind. There may be other examples.

Sign up to request clarification or add additional context in comments.

5 Comments

I hope they do it with a subclass now that RegExp is subclassable.
"it must not extend the RegularExpressionBody and RegularExpressionFlags productions defined below or the productions used by these productions." - this makes it sound as if their extension was invalid anyway , or am I understanding it wrong?
@BenjaminGruenbaum: No, the …Body and …Flags productions just define the form of literals so to say, not the grammar of the regexp and the allowed flags. Any implementation must be able to parse all literals, even if it is extended, and the implementation does not understand the contents of the regexp.
So it is allowed to extend new RegExp(pattern, flags) but not /pattern/flags? (/y worked on literals too afaik). So new RegExp(pattern) and /pattern/ can behave differently?
No, RegularExpressionLiteral basically allows totally arbitrary characters (and flags only must consist of IdentifierParts). It's RegExpInitialize that throws on invalid flag characters.
7
+50

Octal escape sequence in RegExp

A wide application of that clause (which also presents in ECMAScript 5.1 specification Section 7.8.5) is to provide octal escape sequence to RegExp constructor.

/a\1b/.test("a\u0001b");
/a\11b/.test("a\tb");

The default grammar of RegExp (as described in Section 15.10.1 of ES5.1, or Section 21.2.1 of ES6) doesn't support octal escape sequence, and any decimal escape sequence whose value larger than the number of capturing groups triggers SyntaxError. However, many browsers (even old versions) extends the RegExp grammar to support octal escape sequence and evaluates the 2 lines of code above to true.

Starting from ES6, Annex B, which used to be an informative annex in ES3 to ES5.1 specs, is turned into a normative annex, which requires web browsers to support octal escape sequence for compatibility reasons (non-web-browser hosts can choose to stick to the default implementation).

While previous versions of ECMAScript did address support for octal escape sequence, it was only for Numeric and String literals. Backward-compatible RegExp is describes for the first time in Section B.1.4 of ES6, which changes the semantics and syntax of RegExp for BMP patterns to include support for octal escape sequence, among other features.

Unmatched closing brackets ] and non-range-quantifier with {}

Another common extension (as tested on Firefox 38, Chrome 43 and IE9) is to relax the grammar to allow unmatched closing brackets ] and sequences that don't constitute a numbered quantifier and interpret them as literal strings.

/^][[]]$/.test("][]"); // Tokens: ^  ]  [[]  ]  $
/^{56, 67}$/.test("{56, 67}"); // Extra space

Similar to octal escape sequence, the default grammar of RegExp (section 15.10.1 of ES5.1, or section 21.2.1 of ES6) doesn't allow {, }, ] to be an Atom, as those character are excluded from the production of PatternCharacter.

The grammar in Annex B section B.1.4 of ES6 is also extended to interpret non-range-quantifier sequences (sequences which don't match the grammar of QuantifierPrefix) as literal string, via Atom[U] :: PatternCharacter production.

However, the extended grammar doesn't allow unmatched closing ], as both PatternCharacter and PatternCharacterNoBrace production still disallow ].

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.