1

Сould anyone please describe such behavior of std::regex library

string a{"ERROR"};
regex r1{"errOR",0};
cout<<regex_search(a,r1)<<endl;
regex r2{"errOR"};
cout<<regex_search(a,r2)<<endl;
regex r3{"errOR",regex::ECMAScript};
cout<<regex_search(a,r3)<<endl;

cout<<r1.flags()<<endl;
cout<<r2.flags()<<endl;
cout<<r3.flags()<<endl;

gives output

1
0
0
16
16
16

so 1st example became implicitly ignore casing with sum of flags corresponding to default construction of re i.e 16, by the way in std::regex there is no constant that match 0 value, but there is regex::icase == 1;

Is that conceived behavior of std library or I should not to feed to constructor values that explicitly not supported?

5
  • Flag values have names. Use them. Commented Feb 26, 2019 at 21:07
  • I'm not sure if it is documented what happens if you don't use a proper flag but you should assume nothing good will happen. Commented Feb 26, 2019 at 21:08
  • the values of the flags are unspecified (see eg here), but I doubt there is undefined behaviour, anyhow of course you should not pass parameter when they are "explicitily not supported" Commented Feb 26, 2019 at 21:09
  • "by the way in std::regex there is no constant that match 0 value, but there is regex::icase == 1;" where did you take this from? Commented Feb 26, 2019 at 21:11
  • cout<<regex::icase; Commented Feb 26, 2019 at 21:16

1 Answer 1

3

std::regex has several constructors. Two of them are:

  • explicit basic_regex (const charT* str, flag_type flags = ECMAScript);

  • basic_regex (const charT* str, size_t len, flag_type flags = ECMAScript);

The first constructor allows to construct the regex using a null terminated string. The second constructor allows to construct the regex using an array of characters and the array length.

The line: regex r1{"errOR",0}; uses the second one. So, you are not using zero for the flags but for the string length. As regex_search is able to match your empty string, it returns true.

You can modify your experiment to force the first constructor using:

regex r1{ "errOR", regex_constants::syntax_option_type(0) };

The standard states that:

A valid value of type syntax_option_type shall have at most one of the grammar elements ECMAScript, basic, extended, awk, grep, egrep, set. If no grammar element is set, the default grammar is ECMAScript.

So, now it is using ECMASScript and value returned by regex_search is false.

Regarding your last question, I would try to not use an unsupported parameter, because the standandard also states [defns.undefined]:

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for such detailed answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.