-2

I have a large string that should be split at a certain character, if it is not preceded by another certain character.

Would is the most efficient way to do this?

An example: Split this string at ':', but not at "?:":

part1:part2:https?:example.com:anotherstring

What I have tried so far:

  1. Regex (?<!\?):. Very slow.

  2. First getting the indices where to split the string and then split it. Only efficient if there are not many split characters in the string.

  3. Iterating over the string character by character. Efficient if there are not many protect characters (e.g. '?').

6
  • What about split() method? Please post what you have tried so far. Add also examples of your strings with explanation how should be splitted. Commented May 19, 2020 at 11:01
  • I didn't post for reason what I tried so far to not influence potential answers. But if you wanna know: 1. Regex - very slow. 2. First find the indices where to split the string then split it. Efficient in some cases but not in strings with many split characters. 3. Iterating string character by character. Efficient with many split characters but not if they are preceded by the protect character. An example would be split this at ';' but not at '?:': 1:2:https?:example.com:foo:bar Commented May 19, 2020 at 11:06
  • 1
    Does this answer your question? Java split String performances Commented May 19, 2020 at 11:09
  • No as it doesn't take the protect character into account. Commented May 19, 2020 at 11:11
  • 1
    What means “large string” and what means “very slow”? A quick example with a 30 million character string, to be split into 3 million substrings using the simple regex method took half a second on my machine. Is that string “large”; is the needed time “slow”? Even more important, since all three approaches have an entirely different result, what are you actually needing or going to do with the result? There is no sense in doing something apparently faster when it produces a result that needs a much longer conversion afterwards. Commented May 19, 2020 at 15:16

2 Answers 2

0

I fear you would have to go through the string and check if a ":" is preceded by a "?"

int lastIndex=0;
for(int index=string.indexOf(":"); index >= 0; index=string.indexOf(":", lastIndex)){
    if(index == 0 || string.charAt(index-1) != '?'){
        String splitString = string.subString(lastIndex, index);
        // add splitString to list or array
        lastIndex = index+1;
    }
}
// add string.subString(lastIndex) to list or array
Sign up to request clarification or add additional context in comments.

Comments

0

You will have to test this very carefully (since I didn't do that), but using a regular expression in the split() might produce the results you want:

public static void main(String[] args) {
    String s = "Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End";
    String[] result = s.split("(?<!\\?)\\.(?!\\.)");
    System.out.println(String.join("|", result));
}

Output:

Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End

Note:
This only considers your example about splitting by dot if the dot is not preceded by an interrogation mark.

I don't think you will get a much more performant solution than the regex...

2 Comments

I fear not. As said in the comment and in my edit, Regex has been by magnitude the slowest of all ways I tried.
@dankito Can you post your measurements? How have you measured the performance and what are the results?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.