4

I'm trying to split this string :

aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)

so it looks like this array :

[ a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8) ]

Here are the rules, it can accept letters a to g, it can be a letter alone but if there is parentheses following it, it has to include them and its content. The content of the parentheses must be a numeric value.

This is what I tried :

content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        a = content.split("[a-g]|[a-g]\\([0-9]*\\)");
        for (String s:
             a) {
            System.out.println(s);
        }

And here's the output

(2)

(52)

(4) (2)

(14) (6) (8)h(4)5(6)

Thanks.

0

4 Answers 4

1

It is easier to match these substrings:

String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
List<String> res = new ArrayList<>();
Matcher matcher = pattern.matcher(content);
while (matcher.find()){
    res.add(matcher.group(0)); 
} 
System.out.println(res);

Output:

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8)]

See the Java demo and a regex demo.

Pattern details

  • [a-g] - a letter from a to g
  • (?:\(\d+\))? - an optional non-capturing group matching 1 or 0 occurrences of
    • \( - a ( char
    • \d+ - 1+ digits
    • \) - a ) char.
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you it works well. Also thanks for the details, I understand now.
1

If you want to use the split method only, here is an approach you could follow too,

import java.util.Arrays;

public class Test 
{
   public static void main(String[] args)
   {
        String content = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";
        String[] a = content.replaceAll("[a-g](\\([0-9]*\\))?|[a-g]", "$0:").split(":");
        // $0 is the string which matched the regex

        System.out.println(Arrays.toString(a));

   }

}

Regex : [a-g](\\([0-9]*\\))?|[a-g] matches the strings you want to match with (i.e a, b, a(5) and so on)

Using this regex I first replace those strings with their appended versions (appended with :). Later, I split the string using the split method.

Output of the above code is,

[a, b, a(2), b, b(52), g, c(4), d(2), f, e(14), f(6), g(8), h(4)5(6)]

NOTE: This approach would only work with a delimiter that is known to not be present in the input string. For example, I chose a colon because I assumed it won't be a part of the input string.

Comments

0

Split is the wrong approach for this, as it is hard to eliminate wrong entries.

Just "match", whatever is valid and process the result array of found matches:

[a-g](?:\(\d+\))?

Regular expression visualization

Debuggex Demo

Comments

0

You can try the following regex: [a-g](\(.*?\))?

  • [a-g]: letters from a to g required
  • (\(.*?\))?: any amout of characters between ( and ), matching as as few times as possible

You can view the expected output here.

This answer is based upon Pattern, an example:

String input = "aba(2)bb(52)gc(4)d(2)fe(14)f(6)g(8)h(4)5(6)";

Pattern pattern = Pattern.compile("[a-g](?:\\(\\d+\\))?");
Matcher matcher = pattern.matcher(input);
List<String> tokens = new ArrayList<>();
while (matcher.find()) {
    tokens.add(matcher.group());
}

tokens.forEach(System.out::println);

Resulting output:

a
b
a(2)
b
b(52)
g
c(4)
d(2)
f
e(14)
f(6)
g(8)

Edit: Using [a-g](?:\((.*?)\))? you can also easily extract the inner value of a bracket:

while (matcher.find()) {
    tokens.add(matcher.group());
    tokens.add(matcher.group(1)); // the inner value or null if no () are present 
}

2 Comments

It does match the required pattern ... but this won't work with split
@mettleap You are right, i added more details to describe how i solved the problem. Thanks for pointing out!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.