-1

I am trying to parse String that has the following patterns:

  • a2[u]
  • 3[rst]5[g]
  • 3[r2[g]]

I want to extract these strings into following tokens:

  • 2 [u]
  • 3 [rst], 5 [g]
  • 2 [r, 3 [r2[g]] (nested groups)

I am using the following Pattern and Code:

Pattern MY_PATTERN = Pattern.compile("(\\d+)\\[(.+)\\]");
String input = "3[rst]5[g]";
Matcher m = MY_PATTERN.matcher(input);
while(m.find()) {
    System.out.println(m.group(1) + " " + m.group(2));
}

However, it matches to last occurrence of ] instead of the first and that results in an unexpected results. If I change the pattern to (\\d+)\\[(\\w+)\\], it works but fails for 3[r2[g]]. What changes do I need to make so that it doesn't count the whole string as one match?

4
  • 3
    If you plan to match more nested levels than 1, the regex will become unwieldly. Else, use "(\\d+)\\[([^\\]\\[]*(?:\\[[^\\]\\[]*][^\\]\\[]*)*)]". Commented Nov 19, 2020 at 19:57
  • Did it work or do you need more nested level support? Commented Nov 19, 2020 at 22:56
  • 2
    @Darshan: I would suggest against using regex for this. Better you use a token parser since you are dealing nested brackets. Commented Nov 20, 2020 at 5:13
  • 1
    @WiktorStribiżew It would need more nested levels I am afraid. So, I would go with a token parser as anubhava suggested. Commented Nov 20, 2020 at 8:44

1 Answer 1

-1

Looks like you need to add a quantifier to the .+

As it stands the . will eat the whole string and then only match on the last ]. Add a reluctant quantifier ? to the .+ so make the regex (\\d+)\\[(.+?)\\] and see how far you get...

Sign up to request clarification or add additional context in comments.

2 Comments

No way, lazy dot will match 3[r2[g] in 3[r2[g]].
okay - so regex is not the answer then

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.