1

I have the following code snippet that reads lines from std::cin and prints them to std::cout.

#include <iostream>
#include <string>
#include <regex>
int main() {

  //std::regex e2("([^[:blank:]]+)|(\"[^\"]+\")|(\\([^\\)]+\\))");
  const size_t BUFSIZE = (1<<10);
  std::string buffer;
  buffer.reserve( BUFSIZE );

  while (std::getline( std::cin, buffer )) {
    std::cout << buffer << std::endl;
    //std::regex e1("([^[:blank:]]+)|(\"[^\"]+\")|(\\([^\\)]+\\))");
  }
  return 0;
}

The execution time is quite fast for an input of 9,800 lines:

real    0m0.116s
user    0m0.056s
sys     0m0.024s

However, if I uncomment the std::regex e1 object in the while loop, the execution time is slowed down considerably:

real    0m2.859s
user    0m2.800s
sys     0m0.032s

On the other hand, uncommenting the std::regex e2 object, outside the loop, the execution time is not affected at all. Why is this happening, considering that I am not applying any regex matches, but I'm only constructing an object?

NB: I've seen this thread but didn't shed any light.

2 Answers 2

4

In order for matching to be fast, the pattern must be processed into a form that allows fast matching, which takes a lot of time. This is normally done during construction of the regex object; in fact, that's the entire point of constructing the regex object! If there was no extra work done during construction, then there would be no point in having a separate regex object at all -- the match function would just take in the pattern as a raw string and use it then.

Sign up to request clarification or add additional context in comments.

Comments

2

Regex matching is mostly implemented as a finite state machine. The implementation needs to build this state machine. The state machine is dependent on the regular expression you provide. Some regular expressions will have typically very complex finite state machines. The complexity will be a factor of number of branches possible in the regex. The more complex state machine, more work required to set up the regex object before it can start matching input strings.

As @Mehrdad correctly pointed out the sole reason why the regex interface exists instead of being a helper function is to segregate the heavy operation of setting up the state machine and then each search operation will comparatively be light weight.

Here is the proposal for std::regex that talks about these design NIT's in detail

1 Comment

Interesting blend of theory and practice. :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.