0

I want to detect emails in text format so that I can put an anchor tag over them with mailto tag in anchor. I have the regex for it but the code also detects emails which are already encapsulated by anchor tag or is inside the anchor tag mailto parameter.

My regex is:

([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

But it detects 3 matches in the following sample text:

ttt <a href='mailto:[email protected]'>[email protected]</a> abc [email protected]

I want only [email protected] to be matched by the regex.

3

2 Answers 2

2

Very similar to my previous answer to your other question, try this

(?<!(?:href=['"]mailto:|<a[^>]*>))(\b[\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

The only thing that is really different is the word boundary \b before the start of the email.

See a similar expression here on Regexr, its not exactly the same, because Regexr does not support alternations and infinite length in the lookbehind.

Sign up to request clarification or add additional context in comments.

1 Comment

One more question, your regex does not work when there is double quotes {"} in the anchor tag like: href="somelink" It works well for single quote in href in anchor tag. for example: href='somelink' Can you help in editing the lookbehind so that is covers both single quote {'} and double quote {"}
2

It's a better idea to leave the parsing of the HTML to something suitable for that (such as the HtmlAgilityPack) and combine that with regex to update the text nodes:

    string sContent = "ttt <a href='mailto:[email protected]'>[email protected]</a> abc [email protected]";
    string sRegex = @"([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)";
    Regex Regx = new Regex(sRegex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

    HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
    doc.LoadHtml(sContent);

    var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]");
    foreach (var node in nodes)
    {
        node.InnerHtml = Regx.Replace(node.InnerHtml, @"<a href=""mailto:$0"">$0</a>");
    }
    string fixedContent = doc.DocumentNode.OuterHtml;

I notice you've posted the same question other forums as well, but haven't appointed an answer in any of them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.