42

I need a regex which will do the following

Extract all strings which starts with http://
Extract all strings which starts with www.

So i need to extract these 2.

For example there is this given string text below

house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue

So from the given above string i will get

    www.monstermmorpg.com
http://www.monstermmorpg.com
http://www.monstermmorpg.commerged

Looking for regex or another way. Thank you.

C# 4.0

2
  • Recently bots pop up to send urls to my game players. I will disallow this :) Though i need to allow internal links. Commented May 14, 2012 at 1:53
  • Perhaps you should consider NOT using regex as it's an awkward approach to parsing HTML... stackoverflow.com/questions/590747/… Commented May 5, 2014 at 10:13

3 Answers 3

102

You can write some pretty simple regular expressions to handle this, or go via more traditional string splitting + LINQ methodology.

Regex

var linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value);

Explanation Pattern:

\b       -matches a word boundary (spaces, periods..etc)
(?:      -define the beginning of a group, the ?: specifies not to capture the data within this group.
https?://  - Match http or https (the '?' after the "s" makes it optional)
|        -OR
www\.    -literal string, match www. (the \. means a literal ".")
)        -end group
\S+      -match a series of non-whitespace characters.
\b       -match the closing word boundary.

Basically the pattern looks for strings that start with http:// OR https:// OR www. (?:https?://|www\.) and then matches all the characters up to the next whitespace.

Traditional String Options

var rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);
Sign up to request clarification or add additional context in comments.

3 Comments

The regex in the answer does not work if you want to parse a part of HTML string. Use the following one instead: @"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?"
The regex @"\b(?:https?://|www\.)[^ \f\n\r\t\v\]]+\b" works a little better (in my case anyway) as if the URL is enclosed in BB tags it will include ] as part of the URL.
@TomGullen Fair point. However, square brackets are actually valid URL characters (according to the RFC spec) so I'll leave the answer as-is as it's just for the most general case.
3

Using Nikita's reply, I get the url in string very easy :

using System.Text.RegularExpressions;

string myString = "test =) https://google.com/";

Match url = Regex.Match(myString, @"http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?");

string finalUrl = url.ToString();

Comments

0

Does not work with html containing URL

For e.g.

<table><tr><td class="sub-img car-sm" rowspan ="1"><img src="https://{s3bucket}/abc/xyzxyzxyz/subject/jkljlk757cc617-a560-48f5-bea1-f7c066a24350_202008210836495252.jpg?X-Amz-Expires=1800&X-Amz-Algorithm=abcabcabc&X-Amz-Credential=AKIAVCAFR2PUOE4WV6ZX/20210107/ap-south-1/s3/aws4_request&X-Amz-Date=20210107T134049Z&X-Amz-SignedHeaders=host&X-Amz-Signature=3cc6301wrwersdf25fb13sdfcfe8c26d88ca1949e77d9e1d9af4bba126aa5fa91a308f7883e"></td><td class="icon"></td></tr></table>

For that need to use below Regular Expression

Regex regx = new Regex("http://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&amp;\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);        

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.