0

So I am trying to parse an HTML table, pass it to a PHP array, then send to MySQL.

This is what I have so far:

<?php
$url = 'website here';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
// a new dom object
$dom = new domDocument;
 
// load the html into the object
$dom->loadHTML($html); 
 
// discard white space
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('table');
 
//foreach($tables as $table)
//{
//    echo $dom->saveHTML($table);
//}
$a = array();
foreach($tables as $k => $table) {
    foreach($table->getElementsByTagName('tr') as $td) {
        $a['table_' . $k][] = array_values(array_filter(explode(' ', str_replace(array("\n", "\r"), "", $td->nodeValue))));
    }
} 
$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
//$test = $value * 5;
echo $value
?>

After parsing the table in entirety looks like this:

| Header 1   | Header 2    | Header 3   |
|:-----------|------------:|:----------:|
| $20,000    | $50,000     | $50,000    |
| $40,000    | $20,000     | $90,000    |
| $60,000    | $30,000     | $580,000   |
| $80,000    | $10,000     | $410,000   |

....you get the idea. All data is space delimited, although the headers could have multiple words.

Running the following allows me to access the first cell in the first row of actual data:

$cell_1 = $a['table_0'][1][0]; // read first cell [0] of first row of data [1]
$value= preg_replace('/[\$,]/', '', $cell_1); //Remove dollar sign and commas
echo $value

....but this could prove tedious considering some tables have different number of rows (all have the same number of columns, however). Side note: only one table per HTML page.

The question is, how can I simplify this and loop through each cell for each row? Ultimately, the end goal will be the MySQL database having a carbon copy of the table. And each value needs to be stripped of commas etc. so I can perform calculations later.

Edit:

Okay, this is what I have to send it to MySQL (appended to bottom of code above). Any recommendations for improvement would be welcomed. I am getting a deprecation error obviously but I am clueless with PDO.

unset($a['table_0'][0]); // remove table header
$json = json_encode($a['table_0'], JSON_UNESCAPED_UNICODE);
// Make the queries
$arr = json_decode($json, true);

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$conn = mysql_connect($dbhost, $dbuser, $dbpass);
if(! $conn )
{
  die('Could not connect: ' . mysql_error());
}
echo 'Connected successfully<br />';
$sql = "CREATE TABLE tutorials_tbl( ".
       "row_id INT NOT NULL AUTO_INCREMENT, ".
       "new INT NOT NULL , ".
       "old INT NOT NULL , ".
       "done INT NOT NULL , ".
       "PRIMARY KEY ( row_id )); ";
mysql_select_db( 'tutorial' );
$retval = mysql_query( $sql, $conn );
if(! $retval )
{
  die('Could not create table: ' . mysql_error());
}
echo "Table created successfully\n";

foreach($arr as $item){
$new= $item[0]; // column 1 of data
$old = $item[1]; // column 2 of data
$done = $item[2]; // column 3 of data

mysql_query("INSERT INTO tutorials_tbl (new, old, done) VALUES('$new', '$old', '$done')") or die (mysql_error());
}

mysql_close($conn);
8
  • For each row, use an insert statement at the end of foreach loop Commented Nov 10, 2014 at 1:49
  • Do you want to do calculations via mysql or just to store table? Commented Nov 10, 2014 at 1:53
  • I think some of my calcs will have to be done by PHP so I would be happy just to get it stored for now. Commented Nov 10, 2014 at 2:03
  • Than do your preg_replace when you create array, at the end just do json_encode onto array and store into database. Commented Nov 10, 2014 at 2:07
  • add more foreach loops Commented Nov 10, 2014 at 2:12

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.