1

I have a problem with parsing text, i have transcript of interview and i have a tag which channel is talking (ch1,ch2). And i need to break it into array and i could to search in which channel someone tells specific word.

For example this is a part of interview

<ch1>Hello</ch1> <ch2>Hello</ch2> <ch1>How are you</ch1><ch2>I'm fine</ch2>

This is a string

String text = "<ch1>Hello</ch1> <ch2>Hello</ch2> <ch2>How are you</ch2>
<ch2>I'm fine</ch2>";

And i want output

 String output[] = {<ch1>Hello</ch1>,<ch2>Hello</ch2>,....}

Thanks for help.

1

2 Answers 2

3

You can use a regular expression with lookahead and lookbehind:

String dialogue = "<ch1>Hello</ch1> <ch2>Hello</ch2> <ch1>How are you</ch1><ch2>I'm fine</ch2>";
String[] statements = dialogue.split("(?<=</ch[12]>)\\s*(?=<ch[12]>)");
System.out.println(Arrays.asList(statements));

Output:

[<ch1>Hello</ch1>, <ch2>Hello</ch2>, <ch1>How are you</ch1>, <ch2>I'm fine</ch2>]

It's a bit hard to read due to the many < and >, but the pattern is like this:

split("(?<=endOfLastPart)inBetween(?=startOfNextPart)")
Sign up to request clarification or add additional context in comments.

Comments

0
text.split("<ch").join("-<ch").split("-").

Can be any string instead of "-" which can be used.

1 Comment

What if there are other - in the text? Better use a much more unique separator character (or character sequence).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.