0

I need to get a list of all image files referenced in my HTML, CSS and JavaScript files.

Here are some examples of what I will find inside my files:

CSS:
ul li {
    list-style-image: url('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7');
}

#insert { background-image: url('../img/insert.jpg'); }
#delete { background-image: url('../img/delete.png'); }

HTML:
<link rel="icon" sizes="192x192" href="touch-icon-192x192.png">
<img id="home" src="img/home.png" class="img-home">

JavaScript:
"BackgroundImageUrl": "textures/glass.jpg"

Using https://regex101.com/ I came up with following expression:

/[\"'](.*(png|jpg|gif))[\"']?/ig

but I am including also base64-encoded files which I don't need, and moreover in my HTML matches there are also some unnecessary parts, for example:

"icon" sizes="192x192" href="touch-icon-192x192.png"

whereby I need just only touch-icon-192x192.png.

How can I parse my files with PHP and get a clean list of my referenced png, gif and jpeg images? Are regex good for this, or is there a better way to accomplish such a task in PHP?

EDIT:

The accepted answer here: How do you parse and process HTML/XML in PHP? is a collection of software libraries and other off-site resources, whereby what I am asking here is a programming related question, about regex.

2
  • DOMDocument Commented Jul 5, 2018 at 10:13
  • 1
    I don't think CSS and JS codes that being asked in this question make it a duplicate. Commented Jul 5, 2018 at 10:20

1 Answer 1

1

Here is a way to do the job:

$input = <<<EOD
CSS:
ul li {
    list-style-image: url('data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7');
}

#insert { background-image: url('../img/insert.jpg'); }
#delete { background-image: url('../img/delete.png'); }

HTML:
<link rel="icon" sizes="192x192" href="touch-icon-192x192.png">
<img id="home" src="img/home.png" class="img-home">

JavaScript:
"BackgroundImageUrl": "textures/glass.jpg"
EOD;

preg_match_all('/(?<=["\'])[^"\']+?\.(?:jpe?g|png|gif)(?=["\'])/', $input, $m);
print_r($m);

Output:

Array
(
    [0] => Array
        (
            [0] => ../img/insert.jpg
            [1] => ../img/delete.png
            [2] => touch-icon-192x192.png
            [3] => img/home.png
            [4] => textures/glass.jpg
        )

)

Explanation:

(?<=["\'])          : lookbehind, make sure we have a quote before
[^"\']+?            : 1 or more any character that is not a quote
\.                  : a dot
(?:jpe?g|png|gif)   : non capture group, list of image extensions
(?=["\'])           : lookahead, make sure we have a quote after
Sign up to request clarification or add additional context in comments.

2 Comments

This is PERFECT!!! As I am trying to understand, could You please kindly explain me exactly, how do You get the first quote\double-quote before the image extension? I had trouble with that...
@deblocker: You'll find usefull informations here: regular-expressions.info/lookaround.html

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.