1

I want to parse some C source files and find all strings ("foo").

Something like that works

String line = "myfunc(\"foo foo foo\", \"bar\");";
System.out.println(line);
String patternStr = "\\\"([^\"]+)\\\"";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher("");
String s;
if(line.matches(".*"+patternStr+".*"))
matcher.reset(line);
while(matcher.find()) {
    System.out.println(" FOUND "+matcher.groupCount()+" groups");
    System.out.println(matcher.group(1));
}

Until there are no "escape quoted strings" like

String line = "myfunc(\"foo \\\"foo\\\" foo\", \"bar\");";

I don't know how to create expression in Java like "without \" but with \." I've found something simmilar for C here http://wordaligned.org/articles/string-literals-and-regular-expressions

Thanks in advance.

0

3 Answers 3

1

What about strings inside comments:

/* foo "this is not a string" bar */

and what about when a single double quote is in a comment:

/* " */ printf("text");

you don't want to capture "*/ printf(" as a string.

In other words: if the above could occur in your C code, use a parser instead of regex.

Sign up to request clarification or add additional context in comments.

Comments

0

Between double-quotes, you want to allow an escape sequence or any character other than a double-quote. You want to test them in that order to allow the longer alternative the opportunity to match.

Pattern pattern = Pattern.compile("\"((\\\\.|[^\"])+)\"");
Matcher matcher = pattern.matcher(line);

while (matcher.find()) {
  System.out.println(" FOUND "+matcher.groupCount()+" groups");
  System.out.println(matcher.group(1));
}

Output:

 FOUND 2 groups
foo \"foo\" foo
 FOUND 2 groups
bar

Comments

0

Try the following:

String patternStr = "\"(([^\"\\\\]|\\\\.)*)\"";

(All I did was convert to Java the regexp from the article you mentioned: /"([^"\\]|\\.)*"/).

3 Comments

It works but could you please explain me how does it work? Why there are four backslashes before closing group bracket ("]")?
I didn't attempt to understand fully how exactly it works - I just translated the regexp from the article to Java. To translate it, I needed to escape quotes and backslashes; therefore each " from the article turned into \" in Java, and each \ turned into \\. That's why the 2 backslashes before the ] turned into 4.
I didn't even try to do so, becouse this regex seemed to be so strange that it shouldn't work on Java ;] If anyone knows what's going on pleas tell me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.