Parsing HTML with PHP using DomDocument

Question

Ok so I would like to parse HTML from a site (any site so I do not know the IDs or anything) and if they have the keyword in their content to return that link. I have used the cURL library to retrieve a site but after trying to parse it I have failed many times.

I am a bit lost so thank you for your time! I just get a blank webpage so clearly it's a mistake.

This is the code I am using with this website as an example

$b = 'http://stackoverflow.com/questions/ask';

$cSession = curl_init(); 

curl_setopt($cSession,CURLOPT_URL, $b);
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);
curl_setopt($cSession,CURLOPT_HEADER, false); 

$result=curl_exec($cSession);

curl_close($cSession);

$dom = new domDocument;
$doc->preserveWhiteSpace = false;

$dom->loadHTML($result);

if (strpos($dom,'HTML') === true) {
echo $b;

strpos() will never return TRUE, therefore your echo statement doesn't get executed. — ComFreek
– ComFreek, Commented Dec 29, 2013 at 15:09
Thank you! I had tried using !==false but then it didn't seem to be working either — user2350696
– user2350696, Commented Dec 29, 2013 at 15:20
@user2350696 You should use === false as e.g 0 could be false... — MadsBjaerge
– MadsBjaerge, Commented Dec 29, 2013 at 15:27
thanks! I am using what you have recommended now but when I search the $dom for the key word, it seems to never echo "Not found" regardless of what keyword I put in. It now always just echos my link. — user2350696
– user2350696, Commented Dec 29, 2013 at 15:38

MadsBjaerge · Accepted Answer · 2013-12-29 15:48:45Z

1

As ComFreek says, strpos does not return true. It returns false on failure, but never true. Instead, check if strpos returns false like this:

if (strpos($dom,'HTML') === FALSE) {
 echo "Not found";
}else{
 echo $b;
}

EDIT:

try this instead!

$b = 'www.sponsored.dk';

$cSession = curl_init(); 

curl_setopt($cSession,CURLOPT_URL, $b);
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);
curl_setopt($cSession,CURLOPT_HEADER, true); 

$result=curl_exec($cSession);

curl_close($cSession);


if (strpos($result,'body') === false) {
echo "Not found";
}else{
echo $b;
}

edited Dec 29, 2013 at 15:48

answered Dec 29, 2013 at 15:19

MadsBjaerge

1265 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user2350696 Over a year ago

Thank you very much for the explanation and now everything is working! Many thanks

user2350696 Over a year ago

Although now it never says 'Not found' even when the keyword is not there. It always seem to display the link! Not sure why

MadsBjaerge Over a year ago

Sorry, i just edited, try the code, and see if it does what you want

user2350696 Over a year ago

And now it works perfectly! Can't thank you enough. Not sure why I thought I needed DOMDocument at all.

MadsBjaerge Over a year ago

It depends on how advanced you wanna go. Basicly DOMDocument makes it represent the HTML document, and you can use the buildt in functions such as getElementById and such, but for your use, that should do just fine.. Remember setting the question as answered so people know :)

Collectives™ on Stack Overflow

Parsing HTML with PHP using DomDocument

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related