5

I'm trying to parse HTML from loadHTML but I'm having trouble, I managed to loop through all <tr>s in the document but I don't know how to loop through the <td> s on each row.

This is what I did so far:

$DOM->loadHTML($url);
$rows= $DOM->getElementsByTagName('tr');

for ($i = 0; $i < $rows->length; $i++) { // loop through rows
    // loop through columns
    ...
}

How can I get loop through the columns in each row?

1
  • Easier-to-use wrappers around the DOM methods exist, specifically for looping over element collections. Commented Jan 9, 2013 at 21:10

3 Answers 3

8

DOMElement also supports getElementsByTagName:

$DOM = new DOMDocument();
$DOM->loadHTMLFile("file path or url");
$rows = $DOM->getElementsByTagName("tr");
for ($i = 0; $i < $rows->length; $i++) {
    $cols = $rows->item($i)->getElementsbyTagName("td");
    for ($j = 0; $j < $cols->length; $j++) {
        echo $cols->item($j)->nodeValue, "\t";
        // you can also use DOMElement::textContent
        // echo $cols->item($j)->textContent, "\t";
    }
    echo "\n";
}
Sign up to request clarification or add additional context in comments.

2 Comments

I haven't been able to echo the col content inside the loop. I tried echo $cols->item($i)->nodeValue;, could you edit it? I'll take this one if it works as it's easier to implement in my case
I have made minor changes to the code. See if it works. And see if the column is not empty.
2

Use DOMXPath to query out the child column nodes with a relative xpath query, like this:

$xpath = new DOMXPath( $DOM);
$rows= $xpath->query('//table/tr');

foreach( $rows as $row) {
    $cols = $xpath->query( 'td', $row); // Get the <td> elements that are children of this <tr>
    foreach( $cols as $col) {
        echo $col->textContent;
    }
}

Edit: To start at specific rows and stop, keep your own index on the row by changing how you're iterating over the DOMNodeList:

$xpath = new DOMXPath( $DOM);
$rows= $xpath->query('//table/tr');

for( $i = 3, $max = $rows->length - 2; $i < $max, $i++) {
    $row = $rows->item( $i);
    $cols = $xpath->query( 'td', $row);
    foreach( $cols as $col) {
        echo $col->textContent;
    }
}

3 Comments

this works, I just have a problem, how can I start from row 3 and end in totalrows - 2? I was using ($i = 3; $i < $rows->length -2; $i++) before for the loop
@Liso - You can keep those counts yourself, I'll update my answer
@Liso - All $xpath->query() is giving you back is a DOMNodeList, so you can iterate over it just the same as you were before. The benefit is that now, instead of just using getElementsByTagName(), you have much more control over what actually gets put in that DOMNodeList. Try my updated solution, it should work for your requirements.
0

Would re-looping work?

$DOM->loadHTML($url);
$rows= $DOM->getElementsByTagName('tr');
$tds= $DOM->getElementsByTagName('td');

for ($i = 0; $i < $rows->length; $i++) {
// loop through columns
     for ($i = 0; $i < $tds->length; $i++) {
     // loop through rows

     }

}

EDIT You will also have to check the parent node to make sure that the rows parent is the tr you are currently in. Something like

if ($rows == tds->parent_node){
// do whatever
}

May not be syntactically 100% correct, but the concept is sound.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.