Javascript Regex: match text after pattern

Question

I have text of a form where there are paragraphs of text with urls interspersed. I would like to parse the string creating html links from the urls and using the following text as the descriptive link text i.e.

possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present

into

<a href="http://www.somewebsite.com/some/path/somepage.html">descriptive text which may or may not be present</a>

This SO article, JS: Find URLs in Text, Make Links, is relevant to what I'm attempting to do but simply places the url as the text within the anchor element.

I am successfully matching the url with

var urlRE= new RegExp("([a-zA-Z0-9]+://)?([a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?([a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(:[0-9]+)?([^ ])+");

but am unsure how to perform the match afterwards.

I came across this post Regex - Matching text AFTER certain characters which seems applicable. I've attempted to wrap my RE in /(?<=my url pattern here).+/ but get an error stating that there is an invalid group and that this results in an invalid RE.

In that post J-Law mentions that

Variable-length lookbehinds aren’t allowed

Is this what I'm attempting to do?

Since I'm already matching the url I feel like I could easily do some substring math to get the desired results.

I'm just using this as an attempt to learn more about regex.

Thanks

FYI, not only variable-length lookbehinds are not allowed in most regex flavours (.NET being the exception to the rule here), JavaScript does not support lookbehinds at all. — Lucas Trzesniewski
– Lucas Trzesniewski, Commented Oct 13, 2014 at 20:35
A note on your regex. It has a requirement there be non-space after the domain. ([^ ])+. If there is such a thing as site.XX, it won't match. You could change it to ([^ ])* and I don't think it would matter much. — user557597
– user557597, Commented Oct 14, 2014 at 0:13

Matt Burland · Accepted Answer · 2014-10-13 20:39:22Z

Just add another capturing group to capture all the stuff at the end and make your inner groups non-capturing. Something like:

    var urlRE= new RegExp("((?:[a-zA-Z0-9]+://)?(?:[a-zA-Z0-9_]+:[a-zA-Z0-9_]+@)?(?:[a-zA-Z0-9.-]+\\.[A-Za-z]{2,4})(?::[0-9]+)?(?:[^ ])+)(.*)$");

    var s = "possibly some text here http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present"
    
    var match = urlRE.exec(s);
    alert(match[0] + "\n\n" + match[1] + "\n\n" + match[2]);

    // Returns: 
    // ["http://www.somewebsite.com/some/path/somepage.html descriptive text which may or may not be present", 
    // "http://www.somewebsite.com/some/path/somepage.html", 
    // " descriptive text which may or may not be present"]

I wrapped your entire regex in brackets () to form the first capturing group and inside that I made all your existing groups non-capturing with ?:, You don't absolutely need to do that (making them non-capturing), but it does simplify the output. Then I just added one more group (.*) to capture everything else until the end of the string $.

After .exec if you have a match, your match will be in [0], the url part will be in [1] and the rest of your text in [2]. This is why we used the non-capturing groups because otherwise you'd have a bunch of other captures that may or may not be useful.

Collectives™ on Stack Overflow

Javascript Regex: match text after pattern

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related