How to parse string using regex

Question

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.

String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]

String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];

// checker becomes machine

My goal is to parse that text string and just return back machine. Which is what I did in the code above.

But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

For clarification: Do you want to get the string that is written behind id= regardless of the following string? — BeWu
– BeWu, Commented Dec 14, 2020 at 17:26
String checker = text.replaceFirst("EnumOption\\[enumId=test,id=(.*)\\]", "$1"); but isn’t there a simpler option like test.get(i).getId()? — Holger
– Holger, Commented Dec 14, 2020 at 17:31
What’s test? As Holger said, can’t you get the object’s ID directly without going the detour via toString()? — Konrad Rudolph
– Konrad Rudolph, Commented Dec 14, 2020 at 17:40
@Holger, I 100% agree with you, but in Eclipse when I tried to do that. .getId() was not an option. I don't know much about java. I just assumed if Eclipse doesn't show it available, then it's not available. — adbarads
– adbarads, Commented Dec 14, 2020 at 19:30
test is a customTypedList, and I iterate through it, looping through each element. and each element is an enumOption — adbarads
– adbarads, Commented Dec 14, 2020 at 19:35

Konrad Rudolph · Accepted Answer · 2020-12-14 17:45:37Z

3

Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:

String id = test.get(i).getId();

answered Dec 14, 2020 at 17:45

Konrad Rudolph

549k142 gold badges967 silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Olivier Grégoire · Accepted Answer · 2020-12-14 19:50:28Z

2

Use a regex' lookbehind:

(?<=\bid=)[^],]*

See Regex101.

(?<=     )            // Start matching only after what matches inside
    \bid=             // Match "\bid=" (= word boundary then "id="),
          [^],]*      // Match and keep the longest sequence without any ']' or ','

In Java, use it like this:

import java.util.regex.*;

class Main {
  public static void main(String[] args) {
    Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
    Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
    if (matcher.find()) {
      System.out.println(matcher.group(0));
    }
  }
}

This results in

machine

edited Dec 14, 2020 at 19:50

answered Dec 14, 2020 at 17:40

Olivier Grégoire

35.7k23 gold badges101 silver badges143 bronze badges

Comments

The fourth bird · Accepted Answer · 2020-12-14 21:18:40Z

Using the replace and split functions don't take the structure of the data into account.

If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].

The value of id will be in capture group 1.

\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]

Explanation

\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
- [^\]]+ Match 1+ times any char except ]
)\]

Regex demo | Java demo

Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");

if (matcher.find()) {
    System.out.println(matcher.group(1));
}

Output

machine

If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.

\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]

In Java

String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";

Regex demo

Reto Höhener · Accepted Answer · 2020-12-14 17:36:01Z

0

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.

I would advise you not use any regex that you did not come up with yourself, or at least understand completely.

PS: I think your solution is actually quite readable.

Here's another non-regex version:

String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);

Not doing you a favor, but the downvote hurt, so here you go:

String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
  throw new RuntimeException("unexpected input: " + input);
}

System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

edited Dec 14, 2020 at 17:36

answered Dec 14, 2020 at 17:22

Reto Höhener

5,9464 gold badges53 silver badges89 bronze badges

6 Comments

Holger Over a year ago

When you talk about performance, I’m wondering why you are unnecessarily doing two substring operations instead of a single text.substring(text.lastIndexOf('=') + 1, text.length() - 1)

Reto Höhener Over a year ago

I did not mean to imply that my version is more performant. I usually optimize for readability and speed of implementation. It was more of a general comment.

Holger Over a year ago

So you think, doing two substring operations instead of one makes the code more readable?

Reto Höhener Over a year ago

Not really no. About as straightforward as the OP's own solution. Readable, easy to understand and step through with the debugger.

Olivier Grégoire Over a year ago

int start = text.lastIndexOf("id="); int end = text.length() - 1; text = text.substring(start, end); How is that not more readable that the two substrings?

|

Collectives™ on Stack Overflow

How to parse string using regex

4 Answers 4

Comments

Comments

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related