4

Expression

var regex = new Regex(@"{([A-z]*)(([^]|:)((\\:)|[^:])*?)(([^]|:)((\\:)|[^:])*?)}");

Breakdown

The expression is [crudely] designed to find tokens within an input, using the format: {name[:pattern[:format]]}, where the pattern and format are optional.

{
  ([A-z]*) // name
  (([^]|:)((\\:)|[^:])*?) // regex pattern
  (([^]|:)((\\:)|[^:])*?) // format
}

Additionally, the expression attempts to ignore escaped colons, thus allowing for strings such as {Time:\d+\:\d+\:\d+:hh\:mm\:ss}

Question

When testing on RegExr.com, everything works sufficiently, however when attempting the same pattern in C#, the input fails to match, why?

(Any advice for general improvements to the expression are very welcome too)

7
  • Hi, because they are different a bit. According docs: In .NET, regular expression patterns are defined by a special syntax or language, which is compatible with Perl 5. So, try debug your regex here, thery are Perl and Js regex implementations; Commented Aug 22, 2017 at 16:53
  • 1
    [A-z] matches more than letters.Every character that is between Z and a in ASCII table. Commented Aug 22, 2017 at 16:53
  • 2
    Can you maybe give two examples; one that works in both and one that fails in C#? Commented Aug 22, 2017 at 16:53
  • 2
    [^] isn’t valid in .NET regular expressions. (It should throw an exception, though, I think?) Commented Aug 22, 2017 at 16:55
  • 1
    See also ECMA script regex option in C#. Commented Aug 22, 2017 at 16:56

1 Answer 1

6

The [^] pattern is only valid in JavaScript where it matches a not nothing, i.e. any character (although in ES5, it does not match the chars from outside the BMP plane). In C#, it is easy to match any char with . and passing the RegexOptions.Singleline modifier. However, in JS, the modifier is not supported, but you may match any char with [\s\S] workaround pattern.

So, the minimum change you need to make to make both compatible in both regex flavors is to change ([^]|:) to [\s\S] because there is no need to use a : as an alternative (since [\s\S] will already match a colon).

Also, do not use [A-z] as a shortcut to match ASCII letters. Either use [a-zA-Z] or [a-z] and pass a case insensitive modifier.

So, you might consider writing the expression as

{([A-Za-z]*)([\s\S]((\\:)|[^:])*?)([\s\S]((\\:)|[^:])*?)}

See a .NET regex test and a JS regex test.

Surely, there may be other enhancements here: remove redundant groups, add support for any escape sequences (not just escaped colons), etc., but it is out of the question scope.

Sign up to request clarification or add additional context in comments.

1 Comment

Clearly explained, and very informative; much appreciated. As a follow up to your recommendation, I've removed redundant groups.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.