0

I have the following string:

21>Please be specific. What do you mean by that?21>Hello are you there623>Simon?

I want to split it into:

21>Please be specific. What do you mean by that?
21>Hello are you there
623>Simon?

Basically the splitter is the numeric value (21 and 623 in this case) followed by the >.

My implementation is that I find the > char, then walk back until I find a non-numeric value.

So basically using sub-string and the like. But it's ugly and I am certain there is a better Regex implementation, but I don't know enough about it.

So can Regex be used here?

1
  • 1
    Maybe something like \d{1,3}>? Or maybe even \d+>? First one would look for 1 to 3 digits followed by >, and the second would look for one or more digits followed by >. Commented Apr 7, 2016 at 17:22

3 Answers 3

5

You can achieve that with look ahead and look behind, so that your match is the zero length area between what you want to split.

string s = "21>Please be specific. What do you mean by that?21>Hello are you there623>Simon?";
Regex reg = new Regex(@"(?<=\D)(?=\d+>)");
var r = reg.Split(s);
foreach(var i in r)
    Console.WriteLine(i);

Will output

21>Please be specific. What do you mean by that?

21>Hello are you there

623>Simon?

Sign up to request clarification or add additional context in comments.

1 Comment

@noob Yep, and I think our answers complement each other.
3

Try with following regex. It matches the zero width between something and number>

Regex: (?<=\D)(?=\d+>) replaced with \n for demo.

Explanation:

  • (?<=\D) looks behind to see if it's not a number.

  • (?=\d+>) looks ahead to see if it's a number>.

And matched the zer0-width between them.

Regex101 Demo

Comments

0

Try: [0-9]+>

Explanation:

  • [0-9]+ At least 1 digit

  • > followed by >


It might make sense to replace the matches with \n$0, which will move them to individual lines.

3 Comments

Wouldn't \d+> be equivalent?
If you use that with Regex.Split it's going to remove the matches. I think the OP's asking how to keep the matches.
@Tim \d includes any unicode digit, and will be slower. It's better to use [0-9] unless you need to find Persian digits or something.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.