0

I just recently read about the DOM module in PHP and now I'm trying to use it for parsing a HTML document. The page said that this was a much better solution than using preg but I'm having a hard time figuring out how to use it.

The page contains a table with dates and X number of events for the date.

First I need to get the text (a date) from a tr with valign="bottom" and then I need to get all the column values from all the tr with valign="top" who is below that tr. I need all the column values from each tr below the tr with the date up until the next tr with valign="bottom" (next date). The number of tr with column data is unknown, can be zero or a lot of them.

This is what the HTML on the page looks like:

<table>
    <tr valign="bottom">
        <td colspan="4">2009-02-26</td>
    </tr>
    <tr valign="top">
        <td>21:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>23:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="bottom">
        <td colspan="4">2009-02-27</td>
    </tr>
    <tr valign="top">
        <td>06:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>10:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
    <tr valign="top">
        <td>13:00</td>
        <td>Column data</td>
        <td>Column data</td>
        <td>Column data</td>
    </tr>
</table>

So far I've been able to get the first two dates (I'm only interested in the first two) but I don't know how to go from here.

The xpath query I use to get the date trs is

$result = $xpath->query('//tr[@valign="bottom"][position()<3]);

Now I need a way to connect all the events for that day to the date, ie. select all the tds and all the column values up until the next date tr.

3 Answers 3

3
$oldSetting = libxml_use_internal_errors( true ); 
libxml_clear_errors(); 

$html = new DOMDocument(); 
$html->loadHtmlFile('http://url/table.html'); 

$xpath = new DOMXPath( $html ); 
$elements = $xpath->query( "//table/tr" ); 

foreach ( $elements as $item ) {
  $newDom = new DOMDocument;
  $newDom->appendChild($newDom->importNode($item,true));

  $xpath = new DOMXPath( $newDom ); 

  foreach ($item->attributes as $attribute) { 

    for ($node = $item->firstChild; $node !== NULL; 
         $node = $node->nextSibling) {
      if (($attribute->nodeName =='valign') && ($attribute->nodeValue=='top'))
      {
        print($node->nodeValue); 
      }
      else
      {
        print("<br>".$node->nodeValue);
      }
    }
    print("<br>");
  } 
}

libxml_clear_errors(); 
libxml_use_internal_errors( $oldSetting ); 
Sign up to request clarification or add additional context in comments.

Comments

0

Use following-sibling().

2 Comments

Thanks, but how do you tell xpath to only select siblings up to a node with [valign="bottom"]? If I use following-sibling::tr[@valign="top"] on my selected date it'll return all the following trs when I only want the ones up until the next date tr?
Select all nodes that are following siblings of the current tr[@valign="bottom"], but are not following siblings of the next one. For example for the first one: following-sibling::tr[@valign="bottom"][1] and not(following-sibling::tr[@valign="bottom"][2])
0

This XPath expression

/table/tr/td[@colspan=4]

or

/table/tr[valign='bottom']/td

Result in a node set with date cells.

How to get cells between marks?

/table/tr/td[not(@colspan=4)][preceding::td[@colspan=4][1]='2009-02-26']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.