3

I just started using Boost::regex today and am quite a novice in Regular Expressions too. I have been using "The Regulator" and Expresso to test my regex and seem satisfied with what I see there, but transferring that regex to boost, does not seem to do what I want it to do. Any pointers to help me a solution would be most welcome. As a side question are there any tools that would help me test my regex against boost.regex?

using namespace boost;
using namespace std;

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("\\d*");
    vector<string> vs;
    cmatch matches;
    if( regex_match(s.c_str(), matches, re) ) {
        MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
        for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
            string match(matches[i].first, matches[i].second);
            vs.push_back(match);
        }
    }
    return vs;
}

void _uttokenizer::test_to_vector_int() 
{
    vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
    for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
    CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}

Update (Thanks to Dav for helping me clarify my question): I was hoping to get a vector with 2 strings in them => "0" and "1". I instead never get a successful regex_match() (regex_match() always returns false) so the vector is always empty.

Thanks '1800 INFORMATION' for your suggestions. The to_vector_int() method now looks like this, but it goes into a never ending loop (I took the code you gave and modified it to make it compilable) and find "0","","","" and so on. It never find the "1".

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("(\\d*)");
    vector<string> vs;

    cmatch matches;

    char * loc = const_cast<char *>(s.c_str());
    while( regex_search(loc, matches, re) ) {
        vs.push_back(string(matches[0].first, matches[0].second));
        loc = const_cast<char *>(matches.suffix().str().c_str());
    }

    return vs;
}

In all honesty I don't think I have still understood the basics of searching for a pattern and getting the matches. Are there any tutorials with examples that explains this?

2
  • It'd help if you explained exactly what wasn't working as intended - what results do you get, what do you expect to get? Commented Aug 12, 2009 at 23:49
  • Thanks Dav. Hope I have added enough information with my question. Commented Aug 12, 2009 at 23:57

1 Answer 1

10

The basic problem is that you are using regex_match when you should be using regex_search:

The algorithms regex_search and regex_match make use of match_results to report what matched; the difference between these algorithms is that regex_match will only find matches that consume all of the input text, where as regex_search will search for a match anywhere within the text being matched.

From the boost documentation. Change it to use regex_search and it will work.

Also, it looks like you are not capturing the matches. Try changing the regex to this:

regex re("(\\d*)");

Or, maybe you need to be calling regex_search repeatedly:

char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
  where = m.suffix().first;
}

This is since you only have one capture in your regex.

Alternatively, change your regex, if you know the basic structure of the data:

regex re("(\\d+).*?(\\d+)");

This would match two numbers within the search string.

Note that the regular expression \d* will match zero or more digits - this includes the empty string "" since this is exactly zero digits. I would change the expression to \d+ which will match 1 or more.

Sign up to request clarification or add additional context in comments.

8 Comments

Awesome. Thanks 1800 INFORMATION. I didn't realize how much of a noob I was in boost.regex. (In my defense both "The Regulator" and Expresso give me positive results in response to "Match", so I honed in a similarly named method in boost.regex.) I guess I didn't fathom the significance of the difference between regex_match and regex_search till you pointed it out. Thanks again. I wonder if there is anyway to reduce my "reputation score" even further to display my noobness :).
I tested your suggestion 1800 INFORMATION to replace regex_match with regex_search and now I get two strings: "0" and " ". I don't seem to get the 2nd one as "1". Any suggestions on what I could still be missing?
It looks like you aren't capturing the strings you match, try putting the () around the expression
Thanks 1800 INFORMATION. I unfortunately cannot use the suggestion for regex re("(\\d+).*?(\\d+)"); as the basic structure is a bunch of integers (0+) separated by various punctuations, <br/> or "\r\n" etc. I have updated my question with the latest code version and have described a problem I faced with your suggestion. Any help would be doubly appreciated.
I suspect your current regular expression is wrong - \d* will match zero or more digits - the empty string "" is included in the subset of zero digits which is why it is stopping there - you should change it to \d+
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.