0

After reading the answers from this question: C# regex pattern to extract urls from given string - not full html urls but bare links as well I want to know which would be the fastest way to extract urls from a document, by using regex matching or by using string split method.

So, you have a string containing an html document and want to extract urls.

The regex way would be:

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value); 

And the string split method:

string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

Which one is the most performant way to do it?

3
  • you can try it both with Stopwatch Commented Aug 5, 2016 at 14:05
  • I'm ashamed to admit that my first thought was "is Stopwatch some type of benchmark program" Commented Aug 5, 2016 at 14:09
  • I can't benchmark as I don't have accesss to a PC for a few days. Commented Aug 5, 2016 at 14:14

1 Answer 1

0

Split is faster. Here is some code that you can test with: dotnetfiddle link

using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;

public class Program
{

    public void Main()
    {
        Stopwatch sw = new Stopwatch();

        sw.Start();

        for (int i=0; i < 500; i++)
        {
            Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
        }

        sw.Stop();

        var test1Time = sw.ElapsedMilliseconds;


        sw.Reset();
        sw.Start();

        for (int i=0; i < 500; i++)
        {
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
            var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));  
        }

        sw.Stop();

        var test2Time = sw.ElapsedMilliseconds;

        Console.WriteLine("Regex Test: " + test1Time.ToString());
        Console.WriteLine("Split Test: " + test2Time.ToString());
    }
}
Sign up to request clarification or add additional context in comments.

2 Comments

Wonderful. Thanks for answering,
How about checking it as the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.