4

I am writing a regex to use with the GNU C regex library:

The string is of the form: (text in italics is a description of content)

(NOT a #) start (maybe whitespace) : data

I have written the following code, but it won't match.

regcomp(&start_state, "^[^#][ \\t]*\\(start\\)[ \\t]*[:].*$", REG_EXTENDED);

What do I need to write?

examples: to match:

state : q0
state: q0
state:q0s

not to match:

#state :q0
state q0
# state :q0

Thanks!

1
  • 2
    can you please post some concrete examples? Commented Feb 4, 2010 at 17:37

3 Answers 3

8

The pattern in your question was consuming the first letter in state with [^#], which left the match unable to proceed because it tries to match tate against the pattern \(state\).

You passed the flag REG_EXTENDED which means you don't escape capturing parentheses but do escape literal parentheses.

With regular expressions, say what you do want to match:

^[ \\t]*(state)[ \\t]*:.*$

as in

#include <stdio.h>
#include <regex.h>

int main(int argc, char **argv)
{
  struct {
    const char *input;
    int expect;
  } tests[] = {
    /* should match */
    { "state : q0", 1 },
    { "state: q0",  1 },
    { "state:q0s",  1 },

    /* should not match */
    { "#state :q0",  0 },
    { "state q0",    0 },
    { "# state :q0", 0 },
  };
  int i;
  regex_t start_state;
  const char *pattern = "^[ \\t]*(state)[ \\t]*:.*$";

  if (regcomp(&start_state, pattern, REG_EXTENDED)) {
    fprintf(stderr, "%s: bad pattern: '%s'\n", argv[0], pattern);
    return 1;
  }

  for (i = 0; i < sizeof(tests)/sizeof(tests[0]); i++) {
    int status = regexec(&start_state, tests[i].input, 0, NULL, 0);

    printf("%s: %s (%s)\n", tests[i].input,
                            status == 0 ? "match" : "no match",
                            !status == !!tests[i].expect
                              ? "PASS" : "FAIL");
  }

  regfree(&start_state);

  return 0;
}

Output:

state : q0: match (PASS)
state: q0: match (PASS)
state:q0s: match (PASS)
#state :q0: no match (PASS)
state q0: no match (PASS)
# state :q0: no match (PASS)
Sign up to request clarification or add additional context in comments.

1 Comment

Very thorough! Thank you very much :)
1

Ok, I figured it out:

regcomp(&start_state, "^[^#]*[ \\t]*start[ \\t]*:.*$", REG_EXTENDED);

above solves my problem! (turns out, I forgot to put a * after [^#])...

Thanks for your help anyway, Rubens! :)

1 Comment

How strict do you want to be about what's on the left side of the colon? That pattern also allows "foobarbazstart: xyz" to match, for example.
0

This works with your sample data:

^[^#]\s*\w+\s*:(?<data>.*?)$

EDIT: I don't know, but you'll need to enable multiline support, as first ^ and last $ have a different behavior with that setting.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.