1

I have a string which needs to be split based on a delimiter(:). This delimiter can be escaped by a character (say '?'). Basically the delimiter can be preceded by any number of escape character. Consider below example string:

a:b?:c??:d???????:e

Here, after the split, it should give the below list of string:

a 
b?:c?? 
d???????:e

Basically, if the delimiter (:) is preceded by even number of escape characters, it should split. If it is preceded by odd number of escape characters, it should not split. Is there a solution to this with regex? Any help would be greatly appreciated.

Similar question has been asked earlier here, But the answers are not working for this use case.

Update: The solution with the regex: (?:\?.|[^:?])* correctly split the string. However, this also gives few empty strings. If + is given instead of *, even the real empty matches also ignored. (Eg:- a::b gives only a,b)

5
  • What is the expected result for ::a:b?:c??::d???????:e::? Commented Mar 21, 2019 at 8:22
  • For the above string, it should split with following strings: <empty>,a,b?c??,<empty>,d???????:e,<empty> Commented Mar 21, 2019 at 8:27
  • Aha, so the first and last (start and end of string) are not necessary. Commented Mar 21, 2019 at 8:28
  • Actually, even if they are there it is not a problem. Commented Mar 21, 2019 at 8:31
  • Good, I added all possible solutions Commented Mar 21, 2019 at 8:33

1 Answer 1

2

Scenario 1: No empty matches

You may use

(?:\?.|[^:?])+

Or, following the pattern in the linked answer

(?:\?.|[^:?]++)+

See this regex demo

Details

  • (?: - start of a non-capturing group
    • \?. - a ? (the delimiter) followed with any char
    • | - or
    • [^:?] - any char but the : (your delimiter char) and ? (the escape char)
  • )+ - 1 or more repetitions.

In Java:

String regex = "(?:\\?.|[^:?]++)+";

In case the input contains line breaks, prepend the pattern with (?s) (like (?s)(?:\\?.|[^:?])+) or compile the pattern with Pattern.DOTALL flag.

Scenario 2: Empty matches included

You may add (?<=:)(?=:) alternative to the above pattern to match empty strings between : chars, see this regex demo:

String s = "::a:b?:c??::d???????:e::";
Pattern pattern = Pattern.compile("(?>\\?.|[^:?])+|(?<=:)(?=:)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
    System.out.println("'" + matcher.group() + "'"); 
} 

Output of the Java demo:

''
'a'
'b?:c??'
''
'd???????:e'
''

Note that if you want to also match empty strings at the start/end of the string, use (?<![^:])(?![^:]) rather than (?<=:)(?=:).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.