3

I have text of a form where there are paragraphs of text with urls interspersed. I would like to parse the string creating html links from the urls and using the following text as the descriptive link text i.e.

possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present

into

<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>

This SO article, JS: Find URLs in Text, Make Links, is relevant to what I'm attempting to do but simply places the url as the text within the anchor element.

I am successfully matching the url with

var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");

but am unsure how to perform the match afterwards.

I came across this post Regex - Matching text AFTER certain characters which seems applicable. I've attempted to wrap my RE in /(?<=my url pattern here).+/ but get an error stating that there is an invalid group and that this results in an invalid RE.

In that post J-Law mentions that

Variable-length lookbehinds aren’t allowed

Is this what I'm attempting to do?

Since I'm already matching the url I feel like I could easily do some substring math to get the desired results.

I'm just using this as an attempt to learn more about regex.

Thanks

2
  • 2
    FYI, not only variable-length lookbehinds are not allowed in most regex flavours (.NET being the exception to the rule here), JavaScript does not support lookbehinds at all. Commented Oct 13, 2014 at 20:35
  • A note on your regex. It has a requirement there be non-space after the domain. ([^ ])+. If there is such a thing as site.XX, it won't match. You could change it to ([^ ])* and I don't think it would matter much. Commented Oct 14, 2014 at 0:13

1 Answer 1

4

Just add another capturing group to capture all the stuff at the end and make your inner groups non-capturing. Something like:

    var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");

    var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
    
    var match = urlRE.exec(s);
    alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);

    // Returns: 
    // ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present", 
    // "http://www.somewebsite.com/some/path/somepage.html", 
    // " descriptive text which may or may not be present"]

I wrapped your entire regex in brackets () to form the first capturing group and inside that I made all your existing groups non-capturing with ?:, You don't absolutely need to do that (making them non-capturing), but it does simplify the output. Then I just added one more group (.*) to capture everything else until the end of the string $.

After .exec if you have a match, your match will be in [0], the url part will be in [1] and the rest of your text in [2]. This is why we used the non-capturing groups because otherwise you'd have a bunch of other captures that may or may not be useful.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.