1

I use regex for HTML parsing but I need your help to parse the following table:

            <table class="resultstable" width="100%" align="center">
                <tr>
                    <th width="10">#</th>
                    <th width="10"></th>
                    <th width="100">External Volume</th>
                </tr>                   
                <tr class='odd'>
                        <td align="center">1</td>
                        <td align="left">
                            <a href="#" title="http://xyz.com">http://xyz.com</a>
                            &nbsp;
                        </td>
                        <td align="right">210,779,783<br />(939,265&nbsp;/&nbsp;499,584)</td>
                    </tr>

                     <tr class='even'>
                        <td align="center">2</td>
                        <td align="left">
                            <a href="#" title="http://abc.com">http://abc.com</a>
                            &nbsp;
                        </td>
                        <td align="right">57,450,834<br />(288,915&nbsp;/&nbsp;62,935)</td>
                    </tr>
            </table>

I want to get all domains with their volume(in array or var) for example

http://xyz.com - 210,779,783

Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.

3
  • 2
    You should nearly always use HTML DOM. This case is no different. Commented Mar 30, 2012 at 18:23
  • 2
    See this question. You should never parse HTML using a regex. Commented Mar 30, 2012 at 18:25
  • @Truth can you please help me with HTML DOM, as i have just used HTML dom in simple parsing not for big table. thanks. Commented Mar 30, 2012 at 18:26

1 Answer 1

1

here's an XPath example that happens to parse the HTML from the question.

<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);

$trs = $xpath->query("//table[@class='resultstable'][1]/tr");
foreach ($trs as $tr) {
  $tdList = $xpath->query("td[2]/a", $tr);
  if ($tdList->length == 0) continue;
  $name = $tdList->item(0)->nodeValue;
  $tdList = $xpath->query("td[3]", $tr);
  $vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
  echo "name: {$name}, vol: {$vol}\n";
}
?>
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.