Only match regex if it doesnt start with a pattern in javascript

Question

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.

So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.

So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:

some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6

but not

some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">

I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.

The main scenarios I need to account for are:

Many links in one block of varied text
A single link without any other text
A single link with other varied text

== edit ==

Here is the current regex I am using to match urls:

(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

== edit 2 ==

Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.

Javascript regex multiple captures again

Have you tried using the /g command, which should do a single global replace, rather than having to loop through until a match is "not found"? — freefaller
– freefaller, Commented Sep 27, 2013 at 9:37
In javascript it doesnt seem to work, there is some problem with multiple captures and exec, so you need to loop round until no matches remain. I read something about JS doesnt support captures or multiple matches in a single result, although if you can prove the above in a jsfiddle or something I will happily give you the answer as I could never get it to work. — Grofit
– Grofit, Commented Sep 27, 2013 at 9:40
Why is there a downvote to the question, this is a well defined question given the constraints and the scenario. — Grofit
– Grofit, Commented Sep 27, 2013 at 9:52
try this jQuery based jsfiddle... although it does highlight that the query string part of the string isn't taken into account. If you want vannilla JS, this this jsfiddle — freefaller
– freefaller, Commented Sep 27, 2013 at 9:57

Ibrahim Najjar · Accepted Answer · 2013-09-27 09:48:55Z

3

What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:

(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.

Regex101 Demo

answered Sep 27, 2013 at 9:48

Ibrahim Najjar

19.5k4 gold badges74 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Grofit Over a year ago

this seems to work and addresses the questions context slightly better, as the other answers which are very useful are less about tackling the pattern at the start and changing tact to get the replace to work in 1 go.

freefaller · Accepted Answer · 2013-09-27 10:08:58Z

1

Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.

Using the following vanilla JS appears to work for me (see jsfiddle)...

var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).

For jQuery, it would be (see jsfiddle)...

$(document).ready(function(){
  var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
  var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
  $("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});

Update

Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update

var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");

edited Sep 27, 2013 at 10:08

answered Sep 27, 2013 at 10:03

freefaller

20.1k7 gold badges62 silver badges96 bronze badges

4 Comments

Grofit Over a year ago

Interesting, although the replace works how do you actually access the underlying match so you can make use of the captures data when doing it this way?

freefaller Over a year ago

That's a good question @Grofit, and I'm sorry but I'm simply not aware of how you'd do that. The replace is based on simple pattern matching... if you need to explicit processing on each individual match then I believe (but am happy to be proved wrong) that you would have to do individual matches. If I'm right, I think there is a way to call an external function, but I've never done it and cannot give any advice in that direction... sorry!

Grofit Over a year ago

That is fine buddy, if the question was simply about doing the replace then you would get the answer given javascript's limitations, however as the match still needs to be used outside of the replace I have given the answer to the other chap, but upvoted as im sure for most cases this would be the more applicable answer for most people doing similar.

freefaller Over a year ago

@Grofit, not a problem fella, but it wasn't clear from your OP that you needed the ability to do extra processing on those matches. Good luck with the rest of your project :-)

Tomke · Accepted Answer · 2013-09-27 09:57:59Z

0

Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.

(\s(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))

http://rubular.com/r/9wSc0HNWas

Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)

answered Sep 27, 2013 at 9:57

Tomke

381 silver badge9 bronze badges

3 Comments

Grofit Over a year ago

what if the text was just a link, which had no whitespace before it. In that case it would not work :(

Tomke Over a year ago

That's true, I did not know you expected something like this... Would you expect something like: here is some texthttp://.... ?

Grofit Over a year ago

Nah, that is not too much of a worry as its a rare case and too hard to test for, it was mainly just the case of a link being posted as the sole content which I wanted to point out, but you are right it was not specifically mentioned on the question.

GuiDocs · Accepted Answer · 2013-09-27 10:22:54Z

0

as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.

otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

edited Sep 27, 2013 at 10:22

answered Sep 27, 2013 at 9:58

GuiDocs

7421 gold badge7 silver badges12 bronze badges

2 Comments

Grofit Over a year ago

the reason I cannot use the /g is explained here in the answer: stackoverflow.com/questions/14707360/…

GuiDocs Over a year ago

my answer works even if exec is a must, but you could just use match or replace

Collectives™ on Stack Overflow

Only match regex if it doesnt start with a pattern in javascript

4 Answers 4

1 Comment

4 Comments

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

4 Comments

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related