2

I'm very new to regular expressions and need a little help with something complicated.

I have a list of URLs that may as well be in an array that would look like this:

$urls = array(
    "http://example.com/page.php",
    "http://example.com/page.php?key=value",
    "http://example.com/image.jpg",
    "http://example.com/image.jpg?key=value" ...

I want to loop over the array (which is simple enough with foreach) and for each string return true if the URL points to a file that is an image. I have the following regular expression:

"#\.(jpg|jpeg|gif|png)$# i"

... but it seems to only return true is the string ends in one of the given image extensions. I need to compensate for two factors: 1. if the string has a URL query string on the end of it (ie. ?key=value) and whether the extenion (eg. jpg) is actually part of the query string to a non-image file, for example:

http://example.com/page.php?image=file.jpg

Should return false because the URL is pointing to a PHP file, not a jpg

Thank you for any help!

6
  • 1
    Just an FYI: Unless you actually retrieve it, you cannot know that page.php does not return an image Commented Apr 7, 2016 at 7:52
  • that's ok, i can check the content-type as a fallback, thanks! Commented Apr 7, 2016 at 7:53
  • @Sjon: a happy middle ground might be a HEAD request, trusting the web server to assign the correct Content-Type. OP: you won't have a Content-Type unless you make a request, is what he's saying. Commented Apr 7, 2016 at 7:53
  • Well, I do not like the one-regex solution here: '~^(?!.*\?.*(\.(?:jpg|jpeg|gif|png)\b)).*(?1)(?:$|\?)~' Commented Apr 7, 2016 at 7:55
  • Check tools.ietf.org/html/rfc3986#page-50 on how to properly parse a URL. In your case $6 must end in your declared extensions. Commented Apr 7, 2016 at 7:59

2 Answers 2

8

Complete RegEx Version

Actually, here is a complete RegEx version:

^[^?]*\.(jpg|jpeg|gif|png)

Live Demo on Regex101

How it works:

^[^?]*                  # Removes ?foo=bar&baz=foo
\.(jpg|jpeg|gif|png)    # Image Extension

The first part selects everything up to the ?.... It is the RegEx equivalent to selecting the first item from explode('?', $str). The second part is the same as yours, with the $ removed (since the end of the string may be variables after the ?)


To deal with the following cases of unusual File Extensions like:

  • test.jpgfoo
  • test.pngbar
  • test.jpg.nope
  • image.jpg-test.php
  • image.jpg_test.php

Add a Negative Lookahead to the end, (?![\w.\-_]):

^[^?]*\.(jpg|jpeg|gif|png)(?![\w.\-_])

This will make sure there are no letters, another extension, ., - or _ after the accepted file extension. If there are, the RegEx will fail

Live Demo on Regex101


This RegEx will do what you need, if you do as @DevilaN said and explode('?', $str):

\.(jpg|jpeg|gif|png)(\?.*)?$

Live Demo on Regex101

Sign up to request clarification or add additional context in comments.

5 Comments

@DevilaN I added an update for those unusual cases. Just put (?!\w) at the end, to make sure there are no more letters after the accepted file extension
@DevilaN Thanks, fixed that too!
@DevilaN Seriously! ;) I don't even think that is a valid File Extension, but I have fixed it, as well as any with _! Any others?
@DevilaN Wow! That'll take ages to include! If the OP wants to do that, he can add then to the Negative Lookahead himself. ;). Wait, I now see how your solution makes more sense!
2

Your regular expression is ok, but you need to get rid of ?something=something Just explode("?", $string); and use first part which will contain url with filename only. Then proceed with your normal regex.

$urls = array(
    "http://example.com/page.php",
    "http://example.com/page.php?key=value",
    "http://example.com/image.jpg",
    "http://example.com/image.jpg?key=value"
);

function isImage($l) {
    $arr = explode("?", $l);
    return preg_match("#\.(jpg|jpeg|gif|png)$# i", $arr[0]);
}
foreach ($urls as $url) {
    echo $url . ": " .(isImage($url) ? "true" : "false") . "\n";
}

And the result is:

http://example.com/page.php: false
http://example.com/page.php?key=value: false
http://example.com/image.jpg: true
http://example.com/image.jpg?key=value: true

If you want pure regular expression solution then:

function isImage($l) {
    return preg_match("/^[^\?]+\.(jpg|jpeg|gif|png)(?:\?|$)/", $l);
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.