18

I want to get some URLs of images in Js/HTML:

var a = "http://sub.domain.com/uploads/files/11-11-2011/345301-574-1182-393/2202.jpg";
var b = "http://sub.domain.com/uploads/files/23-11-2011/234552-574-2321-232/asd.png";

Looking for solution that will detect image url. So the output will be:

http://sub.domain.com/uploads/files/11-11-2011/345301-574-1182-393/2202.jpg
http://sub.domain.com/uploads/files/23-11-2011/234552-574-2321-232/asd.png

Thanks!

1
  • 3
    Just to be clear: You want to scan an entire HTML source file that also contains JavaScript for URL strings in the JavaScript sections? Commented Nov 4, 2010 at 15:46

6 Answers 6

42

Based off the information you've given, this should work:

(https?:\/\/.*\.(?:png|jpg))

You can add more extensions by adding |ext after jpg. This will allow for strings with https as well.

Note: You may want to use the case insensitive modifier i to make the capture more inclusive. This would look like:

/(https?:\/\/.*\.(?:png|jpg))/i
Sign up to request clarification or add additional context in comments.

5 Comments

This will fail in situations like src="mail.google.com/mail/u/0/images/cleardot.gif" style="background:url(ssl.gstatic.com/mail/sprites/…) . What works for me is (https?:\/\/[^ ]*\.(?:gif|png|jpg|jpeg))
I think this is better: (http)?s?:?(\/\/[^"']*\.(?:png|jpg|jpeg|gif|png|svg))
yeah +1 : for example : someurl.svg.png (wiki eg.)
Altough @Amarsh is right, OP asked for URL, not a generic path, but URL (which I believe is required to have scheme (e.g. http: ))
If you already have <img ...> tag parsed out and want to get the src no matter what it contains, this worked for me /src\W*=[^\'"]*([\'"])([^\1]*?)\1/ - ? after * means "non-greedy", \W means non-word characters and \1 is referencing the first defined group. Don't use + instead of * to get only non-empty - might not work if <img ...> has more attributes!
12

A little late to the party, but in trying to do something similar to the OP, I created the following regex, which seems to handle relative links as well as absolute ones:

/([a-z\-_0-9\/\:\.]*\.(jpg|jpeg|png|gif))/i

2 Comments

This allows extract only urls, if they are mixed with other text (very useful when you're scraping)
this does not extract filenames completely having spaces inside
6

I created this regex a few days ago:

/^https?:\/\/.*\/.*\.(png|gif|webp|jpeg|jpg)\??.*$/gmi

The ones provided by others in this post work, but will not check for query strings

Example of this regex:

  static checkForImage(url){
    let regex = /^https?:\/\/.*\/.*\.(png|gif|webp|jpeg|jpg)\??.*$/gmi
    let result;
    if (url.match(regex)){
      result = {
        match: url.match(regex)
      }
    } else {
      result = false;
    }
    return result;
  }
checkForImage('https://images-ext-2.discordapp.net/external/yhycJKw8ohsysnU6CBDLQPV4979oQINVmv-fRfu-jL8/%3Fsize%3D2048/https/cdn.discordapp.com/avatars/490535372393021469/a_9e9d0e575eee0221e759257e259681af.gif')

1 Comment

Your code won't work as you expected, I'm afraid. For example, it'll match something like https://foo.bar/a.jpg.pdf as well or really any character after a specified extension will be a matched. I'm not regex guru, but maybe something like this will do: ^https?:\/\/.*\/.*\.(png|gif|webp|jpeg|jpg)($|\?.*$). You may also want to consider adding uri fragment # as a condition too.
3

Try this:

/"(http://[^"]*?\.(jpg|png))"/g

$1 is what you want.

Comments

2

A super strict solution to this would be:

/(http[s]*:\/\/)([a-z\-_0-9\/.]+)\.([a-z.]{2,3})\/([a-z0-9\-_\/._~:?#\[\]@!$&'()*+,;=%]*)([a-z0-9]+\.)(jpg|jpeg|png)/i

3 Comments

Why (http[s]*://)? Is there anything else besides an s that can be appended to http? oO
It's because some services misbehave if you access them via https when they don't support it, or vice versa. I have encountered instances where changing the protocol solved my issue. In any case, it was to give a different approach to other answers as well.
I know, that some services only have http, others might only have https. Oh, I see, you just use the * instead of the ? which would be a little more intentional.
-1
var regex = /(http[s]?:\/\/.*\.(?:png|jpg|gif|svg|jpeg))/i;

This is the result you want

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.