1

I've been stuck on this strange std:wregex behavior :

^(?:(?:[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*\\w+[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*:/)|(?:\\./))(?:(?:[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*\\w+[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*/?)|(?:\\./)|(?:\\.\\./))*$

raise an exception with

e.code() == regex_constants::error_brack

The weird thing is that i've been testing it with an online ECMAScript regex validator without any troubles. Plus de fact that removing the first pair of brackets as follow.

^(?:(?:\\w+[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*:/)|(?:\\./))(?:(?:[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*\\w+[^\\u0000-\\u001f<>:\\\\\"/\\\\\\|\\?\\*]*/?)|(?:\\./)|(?:\\.\\./))*$

Actually solves the problem, without any particular regard to bracket mismatch.

Any one has an explanation to that kind of behavior ?

EDIT:

Seems that even L"[^\u0000-\u001f]" doesn't work.

EDIT:

I'm running a sample on compile and execute, didn't notice that it used GCC. Plus the fact that MVSC seems to be when GCC is giving me a runtime error (exception).

5
  • Not that this would break the pattern, but why does your character class, contain two literal backslashes? (\\\\ ... before " and after /) Commented Aug 3, 2013 at 10:56
  • The regex is formated so it's suitable in c code. So \\\\ is in fact \\ as plain regex. The 1st and 3rd are escape sequences for c and the 2nd is escape sequence for regex. Commented Aug 3, 2013 at 11:56
  • I know that four backslashes make one literal backslash in the pattern, but you've got four backslashes twice in the same character class. Replacing the backslashes with b, that's like writing [^...<>:b\"/b\\|...]. Commented Aug 3, 2013 at 13:41
  • Are you, by any chance, attempting to use GNU libstdc++? It does not yet support regular expressions (even though some of them compile). I am able to reproduce your error with it, but not with LLVM libc++ or with boost.regex Commented Aug 3, 2013 at 15:13
  • What do you mean by the last edit? You mention an exception when using gcc. <regex> will not work on gcc, the library is not fully implemented. Commented Aug 3, 2013 at 19:05

1 Answer 1

1

I know this thread is really old however someone may benefit from an answer.

The issue is the null in the regex string \u0000, which the compiler will interpret as the end of the string.

Regex strings should be declared raw, for example R"(some regex string)". This will solve any null issues in your regex.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.