extract image url from html page with php

Question

how can I extract the post image from this link using php?

I read that I can't do it with regex.

http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy

Thank you so much.

stackoverflow.com/questions/1732348/…

Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams

2013-07-03 10:08:16 +00:00
Commented Jul 3, 2013 at 10:08 — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented Jul 3, 2013 at 10:08
thanks, so how can I do?

michele
– michele

2013-07-03 10:12:01 +00:00
Commented Jul 3, 2013 at 10:12 — michele
– michele, Commented Jul 3, 2013 at 10:12

Nidhin Joseph · Accepted Answer · 2013-07-03 10:22:17Z

4

$content=file_get_contents($url);
if (preg_match("/<img.*src=\"(.*)\".*class=\".*pinit\".*>/", $content, $matches)) 
{
echo "Match was found <br />";
echo $matches[0];
}

$matches[0] will print the whole image tag. And if you want to extract only the URL, then you can use $matches[1] to get the same :)

answered Jul 3, 2013 at 10:22

Nidhin Joseph

1,4813 gold badges15 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Saurabh Rana Over a year ago

I'm trying to do the same for "techcrunch.com/2014/05/09/facebook-is-down-for-many" but it doesn't return anything. I know the <img> lies here :<img src="tctechcrunch2011.files.wordpress.com/2014/05/…" class="" /> but even after few changes it doesn't return anything. Any help would be nice _/_

Nidhin Joseph Over a year ago

That regex was very specific for the pattern in that particular web page. Try this. if (preg_match("/<img.*src=\"(.*)\"/", $content, $matches)) { echo "Match is found <br />"; echo $matches[0]; } Working : The regex will go in search for a src attribute within the image tag, then extracts the image URL that is assumed to be within double quotes. You can modify it as of your requirements.

Aurimas Ličkus · Accepted Answer · 2013-07-03 10:26:25Z

You can/must parse your html with DOM, here is example with your case:

$curlResource = curl_init('http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy');
curl_setopt($curlResource, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlResource, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curlResource, CURLOPT_AUTOREFERER, true);

$page = curl_exec($curlResource);
curl_close($curlResource);


$domDocument = new DOMDocument();
$domDocument->loadHTML($page);

$xpath = new DOMXPath($domDocument);

$urlXpath = $xpath->query("//img[@id='img_caption_3538921']/@src");

$url = $urlXpath->item(0)->nodeValue;

echo $url;

Take your time and learn a little DOM and XPATH it's worth it.

Krishna · Accepted Answer · 2013-07-03 12:56:17Z

1

Try This ...

$content=file_get_contents($url);
if (preg_match("/src=[\"\'][^\'\']+[\"\']/", $content, $matches)) 
{
    echo "Match was found <br />";
    echo $matches[0];
}

answered Jul 3, 2013 at 12:56

Krishna

3812 silver badges9 bronze badges

Collectives™ on Stack Overflow

extract image url from html page with php

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related