2

I have a database table which consists the following format of data in one column.

<table cellspacing="1" cellpadding="0" border="0" width="395">
    <tbody>
        <tr>
            <td valign="top" width="135">
                <p>Calories (kcal)<br>Energy (kj)<br>Fats<br>Carbohydrates<br>Protein<br></p>
            </td>
            <td valign="top">
                <p>178<br>748<br>0 g<br>9.6 g<br>0.1 g<br></p>
            </td>
            <td valign="top" width="135">
                <p>Fiber<br>Sugars<br>Cholesterol<br>Sodium<br>Alcohol<br></p>
            </td>
            <td valign="top">
                <p>0 g<br>-<br>0 mg<br>-<br>26.2 g<br></p>
            </td>
        </tr>
    </tbody>
</table>

I want to make another database which has separate columns for Calories, Fats, Carbohydrates and Protein. To separate this data, I need to fetch data from the old database and parse it like this.

$qry = "SELECT * FROM table";
$res = $mysqli->query($qry);

// new dom object
$dom = new DOMDocument();

while ($row = $res->fetch_assoc()) {

    $html = @$dom->loadHTML($row['columndata']);
    //the table by its tag name
    $tables = $dom->getElementsByTagName('table');
    $rows = $tables->item(0)->getElementsByTagName('tr');

    foreach ($rows as $row)
    {
        $cols = $row->getElementsByTagName('td');
        echo $cols->item(0)->nodeValue.'<br />';
        echo $cols->item(1)->nodeValue.'<br />';
    }
}

This outputs the following:

Calories (kcal)Energy (kj)FatsCarbohydratesProtein  
1787480 g9.6 g0.1 g

I am unable to separate the output string to have correct column values in the new database.

For example, I want to have value 178 in the Calories column, 0 g in the Fats column, etc.

4
  • what do you mean by parse the data based on the HTML tag?? Commented Apr 30, 2013 at 8:31
  • What output would you like? Commented Apr 30, 2013 at 8:32
  • nodeValue removes all the tags, you want something like Javascript's innerHTML. The DOMDocument API doesn't have this, you need to write it. There's a simple get_inner_html function in the comments at php.net/manual/en/class.domelement.php Commented Apr 30, 2013 at 8:36
  • 1
    I modified the question, I added what I need as an example Commented Apr 30, 2013 at 8:39

2 Answers 2

2

Try iterating over the child nodes of the P elements:

foreach ($rows as $row)
{
    $paragraphs = $row->getElementsByTagName('p');
    //ensure that all the text between <br> is in one text node
    $paragraphs->item(0)->normalize();
    foreach($paragraphs->item(0)->childNodes as $node) {
        if ($node->nodeType == XML_TEXT_NODE) {
            echo $node->nodeValue . '<br/>;
        }
    }

}

It is important to call normalize() on the p element, to ensure that the texts between br elements are in one text node each, and not separated, eg <p>Calories (kcal)<br>Energy (kj)<br>...</p> will have text nodes of Calories (kcal) and Energy (kj), not Cal, ories (, kcal) and so on, which they might be without normalizing.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks @SWilk with your help I was able to complete this.
2

If you want to get innerHTML of your td element, you may use following construction:

$tdElement = $row->getElementsByTagName('td')->item(0);
$tdElement->ownerDocument->saveHTML( $tdElement );

It should return you inner html of that node as a string.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.