1

I have a following String:

MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSD

I want to split such a string every time a K or R is encountered, except when followed by a P.

Therefore, I want the following output:

MYLMFILLAAGCSK
MYLLFINNAARPFASSTK
AASTVVTPHHSYTSKPHHSTTSHCK
SSD

At first, I tried using simple .split() function in java but I couldn't get the desired result. Because I really don't know how to mention it in the .split() function not to split if there is a P right after K or R.

I've looked at other similar questions and they suggest to use Pattern matching but I don't know how to use it in this context.

3
  • Are they **K**, **P** etc as text? Or did you put them in bold for us to see them? Commented Feb 9, 2017 at 11:53
  • No I just made them in bold so that you could see Commented Feb 9, 2017 at 11:57
  • You can check your string manually, e.g. with charAt() if it contains K or R alone and then use substring() and the positions found to split your string. Commented Feb 9, 2017 at 11:57

2 Answers 2

6

You can use split:

String[] parts = str.split("(?<=[KR])(?!P)");

Because you want to keep the input you're splitting on, you must use a look behind, which asserts without consuming. There are two look arounds:

  • (?<=[KR]) means "the previous char is either K or R"
  • (?!P) means "the next char is not a P"

This regex matches between characters where you want to split.


Some test code:

String str = "MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSD";
Arrays.stream(str.split("(?<=[KR])(?!P)")).forEach(System.out::println);

Output:

MYLMFILLAAGCSK
MYLLFINNAARPFASSTK
AASTVVTPHHSYTSKPHHSTTSHCK
SSD
Sign up to request clarification or add additional context in comments.

2 Comments

After many many reading about negative lock-ahead assertion I now realize how can that be useful. Thanks :)
Thanks @Bohemian, your answer has helped me a lot
1

Just try this regexp:

(K)([^P]|$)

and substitute each matching by

\1\n\2

as ilustrated in the following demo. No negative lookahead needed. But you cannot use it with split, as it should eliminate the not P character after the K also.

You can do a first transform like the one above, and then .split("\n"); so it should be:

"MYLMFILLAAGCSKMYLLFINNAARPFASSTKAASTVVTPHHSYTSKPHHSTTSHCKSSDK"
    .subst("(K)([^P]|$)", "\1\n\2").split("\n");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.