I've been trying to build a pattern in Java to split the following string by dashes AND by tab characters. The exception is that if a dash appears after a tab has already been encountered in the string, even once, we stop splitting on the dash and only split on tabs. For example:
Input string (those big spaces are tab characters):
"4852174--r-watch 7 47 2 0 80-B 20 5"
Expected output: ["4852174", "r", "watch", "7", "47", "2", "0", "80-B", "20", "5"]
I'm using the following regular expression so far: "(?<!\\d)(\\-+)(?!\t)|\t"
The first set of brackets to signal I don't want any numbers preceding the delimiter, the next to signal that I want one or more dashes, and the last set to note that I want no tabs to follow. Of course, the OR at the end is for splitting by single tab characters.
The result that I'm getting is the following:
["4852174-", "r", "watch", "7", "47", "2", "0", "80-B", "20", "5"]
Notice the extra dash in the "4852174-" that should not be there. I've tried for very long to try to figure this out but any small change I make ruins the splitting elsewhere.
Any help to solve this problem would be much appreciated. Thank you in advance!
80-Bas an output ?(?<!\d)part of your regex). It seems to me that you're missing some rules that you would need. Could you answer: why should80-Bnot be splitted?80-Bso why should that not split?(?:[^\s-]|(?<=\d)-(?=[^\W\d]))+. I thought about matching instead of splitting. I think that your rules aren't well defined so I just made it "work" for the input you've provided.