4

If I have to find let's say a word in a sentence, i can think of two approaches

  1. Using string.IndexOf
  2. Using Regex

Which one is better in terms of performance or best practice

4 Answers 4

6

If it's fairly straightforward to do something without regex, it's almost always cheaper that way. String.IndexOf (or String.Contains) is definitely an example of this.

Sign up to request clarification or add additional context in comments.

Comments

3

It depends on your exact requirements. If you truly need to find a word in a sentence (not a substring), then I believe that could be expressed more concisely and more explicitly using a well-named regex pattern than using IndexOf plus all the extra logic to make sure you're actually getting a complete single word.

On the other hand, if you're simply looking for a substring, then IndexOf is far superior in terms of performance and readability.

1 Comment

+1 for thinking of the non-standard cases. Looking for "part" in "you have to keep these two parts apart, and use this part here." will return the "part" in "parts" if you use IndexOf, and if you iterate through the whole string will return three matchesd in total, when only one of them is the whole word "part". Whole word matching with IndexOf is problematic as many different characters can indicate word boundaries. By contrast, using the RegEx "\bpart\b" will match the whole word "part" exactly once, and will likely be less expensive than a convoluted IndexOf algorithm.
3

This is by no means the most scientific way of measuring things but here is a bit of source code that indicates (under very specific constraints) regex is about 4 times slower then indexof.

class Program
{
private const string Sentence = "The quick brown fox jumps over the lazy dog";
private const string Word = "jumps";

static void Main(string[] args)
{
    var indexTimes = new List<long>();
    var regexTimes = new List<long>();
    var timer = new Stopwatch();

    for (int i = 0; i < 1000; i++)
    {
        timer.Reset();
        timer.Start();
        Sentence.IndexOf(Word);
        timer.Stop();
        indexTimes.Add(timer.ElapsedTicks);
    }

    Console.WriteLine(indexTimes.Average());

    for (int i = 0; i < 1000; i++)
    {
        timer.Reset();
        timer.Start();
        Regex.Match(Sentence, Word);
        timer.Stop();
        regexTimes.Add(timer.ElapsedTicks);
    }

    Console.WriteLine(regexTimes.Average());

    Console.ReadLine();
}
}

Comments

2

In terms of best practices, string.IndexOf is probably a little more obvious to someone reading the code. People's brains tend to close up as soon as they see a regular expression, so something straight-forward like IndexOf would keep their brains open.

As for performance, that's dependent on a lot of things and can only be properly answered through benchmarking of specific code.

4 Comments

Best practice isn't based on readability of the syntax. And regex is definitely more expensive than regular string functions.
@Nicklamort What is "best practice", then?
A best practice is a technique, method, process, activity, incentive, or reward which conventional wisdom regards as more effective at delivering a particular outcome than any other technique, method, process, etc. when applied to a particular condition or circumstance. -wikipedia
It's subjective, is what it is. "Best practice" in coding is usually the best overall combination of readability, maintainability (yes there's a difference) and performance.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.