Detect email in text using regex

Question

I want to detect emails in text format so that I can put an anchor tag over them with mailto tag in anchor. I have the regex for it but the code also detects emails which are already encapsulated by anchor tag or is inside the anchor tag mailto parameter.

My regex is:

([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

But it detects 3 matches in the following sample text:

ttt <a href='mailto:[email protected]'>[email protected]</a> abc [email protected]

I want only [email protected] to be matched by the regex.

simply remove anchor tags before the regex execution...

PA.
– PA.

2012-01-19 12:52:15 +00:00
Commented Jan 19, 2012 at 12:52 — PA.
– PA., Commented Jan 19, 2012 at 12:52
stackoverflow.com/questions/1903356/…

Clive
– Clive

2012-01-19 13:01:08 +00:00
Commented Jan 19, 2012 at 13:01 — Clive
– Clive, Commented Jan 19, 2012 at 13:01
be careful when using regex to validate email.

default
– default

2012-01-19 13:10:51 +00:00
Commented Jan 19, 2012 at 13:10 — default
– default, Commented Jan 19, 2012 at 13:10

Community · Accepted Answer · 2017-05-23 10:34:17Z

2

Very similar to my previous answer to your other question, try this

(?<!(?:href=['"]mailto:|<a[^>]*>))(\b[\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

The only thing that is really different is the word boundary \b before the start of the email.

See a similar expression here on Regexr, its not exactly the same, because Regexr does not support alternations and infinite length in the lookbehind.

edited May 23, 2017 at 10:34

CommunityBot

11 silver badge

answered Jan 19, 2012 at 13:02

stema

93.5k20 gold badges110 silver badges138 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Computer User Over a year ago

One more question, your regex does not work when there is double quotes {"} in the anchor tag like: href="somelink" It works well for single quote in href in anchor tag. for example: href='somelink' Can you help in editing the lookbehind so that is covers both single quote {'} and double quote {"}

jessehouwing · Accepted Answer · 2012-01-22 11:36:02Z

It's a better idea to leave the parsing of the HTML to something suitable for that (such as the HtmlAgilityPack) and combine that with regex to update the text nodes:

    string sContent = "ttt <a href='mailto:[email protected]'>[email protected]</a> abc [email protected]";
    string sRegex = @"([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)";
    Regex Regx = new Regex(sRegex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(sContent);

    var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]");
    foreach (var node in nodes)
    {
        node.InnerHtml = Regx.Replace(node.InnerHtml, @"<a href=""mailto:$0"">$0</a>");
    }
    string fixedContent = doc.DocumentNode.OuterHtml;

I notice you've posted the same question other forums as well, but haven't appointed an answer in any of them.

Collectives™ on Stack Overflow

Detect email in text using regex

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related