3

Having 01:aa,bb,02:cc,03:dd,04:ee as input, I need to extract key-value pairs which are separated by comma. The problem that value can also contain comma. On the other hand, the limitation for indices is that they can only be two digit numerals, and the separator between key and value is always colon.

Hence, the result of the above input should be the following regex groups:

01:aa,bb
02:cc, (comma is optional, can be stripped if exists)
03:dd, (comma is optional, can be stripped if exists)
04:ee

I've tried using (\d{2}:.+?,)*(\d{2}:.+?)$, but this results in:

0: 01:aa,bb,02:cc,03:dd,04:ee
1: 03:dd,
2: 04:ee

Do you have any suggestions?

3 Answers 3

3

You can use a combination of lookahead and reluctant quantifiers for that.

For instance:

String input = "01:aa,bb,02:cc,03:dd,04:ee";
//                           | group 1
//                           || group 2: 2 digits
//                           ||       | separator
//                           ||       | | group 3: any character reluctantly quantified...
//                           ||       | |  | ... followed by ...
//                           ||       | |  |  | ... comma and next digit as 
//                           ||       | |  |  | non-capturing group...
//                           ||       | |  |  |     | ... or...
//                           ||       | |  |  |     || ... end of input
//                           ||       | |  |  |     ||   | multiple matches in input
Pattern p = Pattern.compile("((\\d{2}):(.+?(?=(?:,\\d)|$)))+");
Matcher m = p.matcher(input);
while (m.find()) {
    System.out.println(m.group(2) + " --> " + m.group(3));
}

Output

01 --> aa,bb
02 --> cc
03 --> dd
04 --> ee
Sign up to request clarification or add additional context in comments.

Comments

2

I think this should cover all cases:

Pattern regex = Pattern.compile("(\\d+):([\\w,]+)(?=,\\d|$)");

Explanation:

(\d+)    # Match and capture a number
:        # Match :
([\w,]+) # Match and capture an alphanumeric word (and/or commas)
(?=      # Make sure the match ends at a position where it's possible to match...
 ,\d     # either a comma, followed by a number
|        # or
 $       # the end of the string
)        # End of lookahead assertion

Test it live on regex101.com.

Comments

1

Dario, here's a really simple solution: split the string with this simple regex:

,(?=\d{2}:)

Here's the code:

String[] arrayOfPairs = subjectString.split(",(?=\\d{2}:)");

See the result at the bottom of the online demo.

The reason I suggest this is that you seem happy to match a key-value pair as a whole, as opposed to separating them into two variables.

How does this work?

We split on a comma , that is followed by two digits and a colon, as asserted by the positive lookahead (?=\d{2}:)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.