0

Ok so I would like to parse HTML from a site (any site so I do not know the IDs or anything) and if they have the keyword in their content to return that link. I have used the cURL library to retrieve a site but after trying to parse it I have failed many times.

I am a bit lost so thank you for your time! I just get a blank webpage so clearly it's a mistake.

This is the code I am using with this website as an example

$b = 'http://stackoverflow.com/questions/ask';

$cSession = curl_init(); 

curl_setopt($cSession,CURLOPT_URL, $b);
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);
curl_setopt($cSession,CURLOPT_HEADER, false); 

$result=curl_exec($cSession);

curl_close($cSession);

$dom = new domDocument;
$doc->preserveWhiteSpace = false;

$dom->loadHTML($result);

if (strpos($dom,'HTML') === true) {
echo $b;    
4
  • 2
    strpos() will never return TRUE, therefore your echo statement doesn't get executed. Commented Dec 29, 2013 at 15:09
  • Thank you! I had tried using !==false but then it didn't seem to be working either Commented Dec 29, 2013 at 15:20
  • @user2350696 You should use === false as e.g 0 could be false... Commented Dec 29, 2013 at 15:27
  • thanks! I am using what you have recommended now but when I search the $dom for the key word, it seems to never echo "Not found" regardless of what keyword I put in. It now always just echos my link. Commented Dec 29, 2013 at 15:38

1 Answer 1

1

As ComFreek says, strpos does not return true. It returns false on failure, but never true. Instead, check if strpos returns false like this:

if (strpos($dom,'HTML') === FALSE) {
 echo "Not found";
}else{
 echo $b;
}

EDIT:

try this instead!

$b = 'www.sponsored.dk';

$cSession = curl_init(); 

curl_setopt($cSession,CURLOPT_URL, $b);
curl_setopt($cSession,CURLOPT_RETURNTRANSFER,true);
curl_setopt($cSession,CURLOPT_HEADER, true); 

$result=curl_exec($cSession);

curl_close($cSession);


if (strpos($result,'body') === false) {
echo "Not found";
}else{
echo $b;
}
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much for the explanation and now everything is working! Many thanks
Although now it never says 'Not found' even when the keyword is not there. It always seem to display the link! Not sure why
Sorry, i just edited, try the code, and see if it does what you want
And now it works perfectly! Can't thank you enough. Not sure why I thought I needed DOMDocument at all.
It depends on how advanced you wanna go. Basicly DOMDocument makes it represent the HTML document, and you can use the buildt in functions such as getElementById and such, but for your use, that should do just fine.. Remember setting the question as answered so people know :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.