3

There already exist similar questions, but all of them use regexen. The code I'm using (that strips the separators):

string[] sentences = s.Split(new string[] { ". ", "? ", "! ", "... " }, StringSplitOptions.None);

I would like to split a block of text on sentence breaks and keep the sentence terminators. I'd like to avoid using regexen for performance. Is it possible?

4
  • Possible duplicate stackoverflow.com/questions/521146/… Commented Apr 11, 2011 at 17:53
  • Is there a reason you can't or don't want to use regex Commented Apr 11, 2011 at 17:54
  • Funny. The title of the duplicate is "C# split string but keep split chars". Commented Apr 11, 2011 at 17:56
  • @rerun: This will be ran on a very large file and regex splitting takes up to three times as long as the String methods. Commented Apr 11, 2011 at 18:00

1 Answer 1

6

I don't believe there is an existing function that does this. However you can use the following extension method.

public static IEnumerable<string> SplitAndKeepSeparators(this string source, string[] separators) {
  var builder = new Text.StringBuilder();
  foreach (var cur in source) {
    builder.Append(cur);
    if (separators.Contains(cur)) {
      yield return builder.ToString();
      builder.Length = 0;
    }
  }
  if (builder.Length > 0) {
    yield return builder.ToString();
  }
}
Sign up to request clarification or add additional context in comments.

5 Comments

It seems like this will break for the last field?
@jfs, @pst whoops, forgot to add in the final check
if (separators.Contains(cur)) won't compile.
@Isaac G. using System.Linq; (This is because Contains is an Extension Method from LINQ)
This is completely broken. “cur” is a char, “separators” is a string[]. “separators.Contains(cur)” makes no sense.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.