0

How can I fetch all images src into array with file_get_content(), with preg_match or whatever?

0

2 Answers 2

4

You shouldn't use regex to parse HTML. You should use classes like DOMDocument to do so. DOMDocument has the getElementsByTagName method that can be used to retrieve all the img tag from the document you want to parse.

Here's an example that will echo the list of the images in the document :

<?php
    $document = new DOMDocument();
    $document->loadHTML(file_get_contents('yourfilehere.html'));
    $lst = $document->getElementsByTagName('img');

    for ($i=0; $i<$lst->length; $i++) {
        $image = $lst->item($i);
        echo $image->attributes->getNamedItem('src')->value, '<br />';
    }
?>
Sign up to request clarification or add additional context in comments.

Comments

0

It's more reliable and simpler to use phpQuery or SimpleHTMLparser (more elaborate). But for basic extraction purposes, and just searching for src= attributes, this is overkill and an regular expression is in fact sufficient:

preg_match_all('/<img[^>]+src\s*=[\'\"\s]?([^<\'\"]+)/ims', file_get_contents($url), $uu);

Note that it will yield relative path names, mostly not URLs. So needs postprocessing, whereas phpQuery IIRC has a shortcut for normalizing them.

2 Comments

Regex isn't sufficient this will parse things you might not want such as image in comments.
@HoLyVieR, you'd have a real world example? Nobodys talking about parsing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.