6

how can I extract the post image from this link using php?

I read that I can't do it with regex.

http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy

Thank you so much.

2

3 Answers 3

4
$content=file_get_contents($url);
if (preg_match("/<img.*src=\"(.*)\".*class=\".*pinit\".*>/", $content, $matches)) 
{
echo "Match was found <br />";
echo $matches[0];
}

$matches[0] will print the whole image tag. And if you want to extract only the URL, then you can use $matches[1] to get the same :)

Sign up to request clarification or add additional context in comments.

2 Comments

I'm trying to do the same for "techcrunch.com/2014/05/09/facebook-is-down-for-many" but it doesn't return anything. I know the <img> lies here :<img src="tctechcrunch2011.files.wordpress.com/2014/05/…" class="" /> but even after few changes it doesn't return anything. Any help would be nice _/_
That regex was very specific for the pattern in that particular web page. Try this. if (preg_match("/<img.*src=\"(.*)\"/", $content, $matches)) { echo "Match is found <br />"; echo $matches[0]; } Working : The regex will go in search for a src attribute within the image tag, then extracts the image URL that is assumed to be within double quotes. You can modify it as of your requirements.
2

You can/must parse your html with DOM, here is example with your case:

$curlResource = curl_init('http://www.huffingtonpost.it/2013/07/03/stupri-piazza-tahrir-durante-proteste-anti-morsi_n_3538921.html?utm_hp_ref=italy');
curl_setopt($curlResource, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curlResource, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curlResource, CURLOPT_AUTOREFERER, true);

$page = curl_exec($curlResource);
curl_close($curlResource);


$domDocument = new DOMDocument();
$domDocument->loadHTML($page);

$xpath = new DOMXPath($domDocument);

$urlXpath = $xpath->query("//img[@id='img_caption_3538921']/@src");

$url = $urlXpath->item(0)->nodeValue;

echo $url;

Take your time and learn a little DOM and XPATH it's worth it.

Comments

1

Try This ...

$content=file_get_contents($url);
if (preg_match("/src=[\"\'][^\'\']+[\"\']/", $content, $matches)) 
{
    echo "Match was found <br />";
    echo $matches[0];
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.