0

I have a particular problem which I can't crack. I searched for every tutorial or form entries, but had no luck in succeeding in what I need to do. So my HTML file:

<html>
 <head>**SOMETHING HERE**</head>
 <body>
  <div>
   <table>
    <thead>
  <tr><th>TEXT/NUM IS HERE</th><th>TEXT/NUM IS HERE</th><th>TEXT/NUM IS HERE</th></tr>
    </thead><tbody>**SOMETHING HERE**</tbody></tfoot>**SOMETHING HERE**</tfoot>
   </table>
  </div>
 </body>
</html>

What I need is to go through every tag (th) in the "thead=>tr" tag and record the value between these "th" tags into an array;

For this I was planning to use DOMDocument and DOMXPath.

There was many ways I tried to solve this issue, but most found one online was:

$file = "index.html";
$dom = new DOMDocument();
$dom->loadHTMLfile($file);
$thead = $dom->getElementsByTagName('thead');
$thead->parentNode;
$th = $thead->getElementsByTagName('th')
echo $th->nodeValue . "\n";

But I'm still getting many errors and can't find a way to do this. Is there any way of doing this nice end simple and of course foreach element in the parent element.

Thank you.

1
  • 1
    getElementsByTagName. Elements. Not element, but elements. It returns an DOMNodeList as specified by the manual. You need to iterate through this. Commented Dec 4, 2013 at 11:27

3 Answers 3

3

Use DOMXPath:

$html = <<<EOL
<html>
    <head>**SOMETHING HERE**</head>
    <body>
        <div>
            <table>
                <thead>
                    <tr>
                        <th>TEXT/NUM IS HERE</th>
                        <th>TEXT/NUM IS HERE</th>
                        <th>TEXT/NUM IS HERE</th>
                    </tr>
                </thead>
                <tbody>**SOMETHING HERE**</tbody>
                <tfoot>**SOMETHING HERE**</tfoot>
            </table>
        </div>
    </body>
</html>
EOL;

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);

$nodes = $xpath->query('//table/thead/tr/th');

$data = array();

foreach ($nodes as $node) {
    $data[] = $node->textContent;
}

print_r($data);
Sign up to request clarification or add additional context in comments.

Comments

1
<?php
$html = new file_get_html('file.html');
$th = $html->find('thead th');
$array = array();
foreach($th as $text) 
    $array[] = $th->innertext;
?>

This uses the Simple HTML Dom Parser which can be found here.

Comments

0

If you want to keep it in the same style as what you have (and therefore learn what you did wrong) try this:

$file = "index.html";
$dom = new DOMDocument();
$dom->loadHTMLfile($file);

$oTHeadList = $dom->getElementsByTagName('thead');

foreach( $oTHeadList as $oThisTHead ){

    $oThList = $oThisTHead->getElementsByTagName('th');

    foreach( $oThList as $oThisTh ) {

        echo $oThisTh->nodeValue . "\n";
    }
}

Basically "getElementsByTagName" returns a NodeList instead of a Node, so you have to loop over them to get to the individual nodes.

Additionally, in your HTML you have a closing tfoot instead of an opening one, and if you test using the html document you provided then the **SOMETHING HERE** inside your head tag will cause warnings to be thrown (as will any other invalid HTML).

If you want to suppress the warnings an loading you can add an '@', but it's not a good idea to pepper that symbol too much around your code.

@$dom->loadHTMLfile($file);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.