1

I'm trying to split a string into an array around words in a string array. Right now, I'm using myString.Split(arrayOfWordsToSplitOn, StringSplitOptions.RemoveEmptyEntries), which splits the string, but doesn't include the actual word that it is splitting on.

For example, if I have the string "My cat and my dog are very lazy", and a string array {"cat", "dog"}, right now it returns {"My", "and my", "are very lazy"}.

However, I would like to have the final output be {"My", "cat", "and my", "dog", "are very lazy"}. Is there any way to do this?

5
  • 1
    you could use a regex... Commented Mar 10, 2017 at 15:07
  • Do you want to split on word boundaries or also substrings? Your current String.Split approach does the latter Commented Mar 10, 2017 at 15:11
  • Still a regex would be good to use: Regex.Split(s, string.Format(@"\b({0})\b", string.Join("|", arrayOfWordsToSplitOn))) Commented Mar 10, 2017 at 15:12
  • Sorry, I understand that word is ambiguous. I'm trying to build a dictionary definition that contains links for words that match other dictionary definitions. I already do a query to determine the list of words that will be linking to other terms. This array will be what I need to split on, so that the linked terms are separate entities in the array. Commented Mar 10, 2017 at 15:13
  • @WiktorStribiżew Thank you, I'll give that a shot Commented Mar 10, 2017 at 15:13

1 Answer 1

4

You may create an alternation based regex out of your list of search words, and wrap that part with a capturing group, (...). Then, add \s* to strip the whitespaces around the group and use Regex.Split:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;

public class Test
{
    public static void Main()
    {
        var arrayOfWordsToSplitOn = new List<string> { "cat", "dog" };
        var s = "My cat and my dog are very lazy";
        var pattern = string.Format(@"\s*\b({0})\b\s*", string.Join("|", arrayOfWordsToSplitOn));
        var results = Regex.Split(s, pattern).Where(x => !String.IsNullOrWhiteSpace(x)).ToList();
        foreach (var res in results)
            Console.WriteLine(res);
    }
}

See the C# demo.

Results:

My
cat
and my
dog
are very lazy

NOTES:

  • If the search words can contain non-word chars, the pattern should be adjusted as \b (word boundaries) might fail the match, and the search "words" will have to be Regex.Escaped
  • The search word array might need sorting by length and alphabet if you decide to drop word boundaries.
Sign up to request clarification or add additional context in comments.

1 Comment

awesome answer, makes it so easy

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.