1

I am using Java. I need to parse the following line using regex :

<actions>::=<action><action>|X|<game>|alpha

It should give me tokens <action>, <action>,X and <game>

What kind of regex will work?

I was trying sth like: "<[a-zA-Z]>" but that doesn't take care of X or alpha.

2
  • 1
    Should it match alpha or not? Commented Mar 7, 2013 at 5:59
  • yes it should also include alpha. Commented Mar 7, 2013 at 6:14

4 Answers 4

5

You can try something like this:

String str="<actions>::=<action><action>|X|<game>|alpha";
str=str.split("=")[1];
Pattern pattern = Pattern.compile("<.*?>|\\|.*?\\|");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
    System.out.println(matcher.group());
}
Sign up to request clarification or add additional context in comments.

1 Comment

this includes X with |X|. regex should ignore |
1

You should have something like this:

String input = "<actions>::=<action><action>|X|<game>|alpha";
Matcher matcher = Pattern.compile("(<[^>]+>)(<[^>]+>)\\|([^|]+)\\|(<[^|]+>)").matcher(input);
while (matcher.find()) {
     System.out.println(matcher.group().replaceAll("\\|", ""));
}

You didn't specefied if you want to return alpha or not, in this case, it doesn't return it.

You can return alpha by adding |\\w* to the end of the regex I wrote.

This will return:

<action><action>X<game>

2 Comments

pattern should not include the "|". this spits out:token:<action> token:<action> token:<action> token:|X| token:<game>
Can you also tell me how to tokenize this:<actions>::=<action><action><action>action. Here there are no "|" and you need to get tokens <action>,<action>,<action> and action? Thanks.
0

From the original pattern it is not clear if you mean that literally there are <> in the pattern or not, i'll go with that assumption.

String pattern="<actions>::=<(.*?)><(.+?)>\|(.+)\|<(.*?)\|alpha";

For the java code you can use Pattern and Matcher: here is the basic idea:

   Pattern p = Pattern.compile(pattern, Pattern.DOTALL|Pattern.MULTILINE);
   Matcher m = p.matcher(text);
   m.find();
   for (int g = 1; g <= m.groupCount(); g++) {
      // use your four groups here..
   }

1 Comment

wait, why is alpha hardcoded here. Yes it should include "<" and ">" and also words that do not contain these "<" and ">" endings. In the example above, tokens should be <action>, <action>,X,<game>,alpha.
0

You can use following Java regex:

Pattern pattern = Pattern.compile
       ("::=(<[^>]+>)(<[^>]+>)\\|([^|]+)\\|(<[^>]+>)\\|(\\w+)$");

1 Comment

@Dev: Or see the Java code with above regex running here: ideone.com/8b7DP0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.