2

I was playing around in Linqpad with a regex to extract a string. I have a few doubts that I'm sharing. Can anyone please shed some light on this matter. -

string s = "abc|xyz";
Regex.Match(s, @"(\w*)[|]{1}(\w*)").Dump();
Regex.Split(s, @"(\w*)[|]{1}(\w*)").Dump();

With Regex.Match I get back two groups which I can easily extract.

Regex.Match.

But I don't understand why in Regex.Split there are two empty entries.

Regex.Split

1 Answer 1

2

Let's analyze your string:

abc|xyz
\_____/  <-- the match
\_/      <-- capture group 1
    \_/  <-- capture group 2

Regex.Split includes the captured groups into the resulting array.

The splits happen at the whole match, right there:

abc|xyz
\      \

So there's an empty string before the match, and an empty string after the match. The two items in the middle are inserted because of the aforementioned split behavior:

If capturing parentheses are used in a Regex.Split expression, any captured text is included in the resulting string array. For example, if you split the string "plum-pear" on a hyphen placed within capturing parentheses, the returned array includes a string element that contains the hyphen.

Sign up to request clarification or add additional context in comments.

3 Comments

Is it possible to split by group?
Why don't you simply split on the | char, without using regexes at all?
You'd either need to shape the whole pattern to fit your needs, or you could loop over matches and groups, and build your list of strings in the loop (basically, reimplement Split with the differences you neeed). Split is just a convenience function that's really easy to implement yourself.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.