1

I have the following html and i am using php's DomDocument class to get the element with id 'nextPageBtn' next to the script tag. the problem is my query doesnot return anything (as if there is no element with the specified id). heres the html i am parsing.

<body>
    <div style='float:left'><img src='../../../../includes/ph1.jpg'></div>

    <label style='width: 476px; height: 40px; position: absolute;top:100px; left: 40px; z-index: 2; background-color: rgb(255, 255, 255);; background-color: transparent' >
    <font size="4">1a. Nice to meet you!</font>
    </label>
    <img src='ENG_L1_C1_P0_1.jpg' style='width: 700px; height: 540px; position: absolute;top:140px; left: 40px; z-index: 1;' />

    <script type='text/javascript'> 


    swfobject.registerObject('FlashID');
    </script>

    <input type="image" id="nextPageBtn" src="../../../../includes/ph4.gif" style="position: absolute; top: 40px; left: 795px; ">

    </body>

and heres the php code to parse it.

$doc->loadHTMLFile($path);

    $doc->encoding='UTF-8';
    $x = new DOMXPath($doc);
$nextPage=$x->query("//*[@id='nextPageBtn']")->item(0);
if($nextPage)
    {

    echo 'found it..';
}

I think the line 'swfobject.registerObject('FlashID')' is generating some kind of error which is avoiding the element to be found?

3
  • Your xpath expression looks valid at a first glance. - And the rest of your code as well. Unable to reproduce: codepad.viper-7.com/RUNGOd - probably you're looking at the wrong place. $doc->encoding='UTF-8'; looks superfluous to me. Commented Dec 19, 2011 at 11:29
  • If you're able to edit the markup of the file you're processing I'd suggest simply giving an ID to the element you want to grab and then getElementById() it. Commented Dec 19, 2011 at 11:33
  • What @GordonM says: If the XHTML has a DTD that specifies the ID attribute, getElementById works. Commented Dec 19, 2011 at 11:43

1 Answer 1

1

As written in the comment, your code just works flawlessly. Demo: http://codepad.viper-7.com/RUNGOd

What you consider a source of problem:

I think the line 'swfobject.registerObject('FlashID')' is generating some kind of error which is avoiding the element to be found?

Hardly can be one as DOMDocument::loadHTMLFile should deal with all tags (otherwise you would have recieved errors/warnings in loading the document. After loading has been done, DOMDocument has normalized data accessible, so there aren't such issues (if there isn't a bug in libxml, the underlying library, but there hardly is for such a general thing).

So what are the options here? Probably the HTML is not the HTML you think of. That could be if loading the HTML fails in your case. Check for errors while loading:

error_reporting(~0); ini_set('display_errors', 1);

Also validate that the HTML is the HTML you think after loading:

$doc->loadHTMLFile($path);
echo $doc->saveHTML();

which will output the "source".

Also check your LIBXML version:

printf("LIBXML version: %s\n", LIBXML_DOTTED_VERSION);

LIBXML is the underlying library PHP's DOMDocument is based on. Depending on the version there can be bugs and not all features are working. For example the getElementById function doesn't work with loadHTMLFile/loadHTML with version 2.6.26 but it does with version 2.7.7 (the XPath expression you're using is not affected with these two versions).

If you're running into an encoding issue here (the source file has some other encoding than expected), it's harder to tell with the information you've provided. Internally DOMDocument's default encoding is UTF-8 in PHP, so setting:

 $doc->encoding='UTF-8';

after you've loaded the file looks superfluous to me. Maybe you should just remove this to reduce the code to easier find a place the error comes from (as I did in the demo).

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks hakre...found out the problem was the utf-8...removed it n everything is fine now!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.