0

I use the following PHP script to parse a table.

It works if each element is on the same row, for example:

<td></td>
<td></td>
<td></td>

How can I make it work if "start tag" and "close tag" are on different rows? Like so:

<td></td>
<td>
</td>
<td></td>

PHP script:

function parseTable($html)
{
  // Find the table
  preg_match("/<table.*?>.*?<\/[\s]*table>/s", $html, $table_html);

  // Get title for each row
  preg_match_all("/<th.*?>(.*?)<\/[\s]*th>/", $table_html[0], $matches);
  $row_headers = $matches[1];

  // Iterate each row
  preg_match_all("/<tr.*?>(.*?)<\/[\s]*tr>/s", $table_html[0], $matches);

  $table = array();

  foreach($matches[1] as $row_html)
  {
    preg_match_all("/<td.*?>(.*?)<\/[\s]*td>/", $row_html, $td_matches);
    $row = array();
    for($i=0; $i<count($td_matches[1]); $i++)
    {
      $td = strip_tags(html_entity_decode($td_matches[1][$i]));
      $row[$row_headers[$i]] = $td;
    }

    if(count($row) > 0)
      $table[] = $row;
  }
  return $table;
}
2
  • 5
    Don't use regexes for parsing HTML Commented Dec 13, 2012 at 1:46
  • The s flag will help you, however, regular expressions are probably the wrongest way to go when it comes to HTML or XML parsing. Commented Dec 13, 2012 at 2:00

1 Answer 1

2

Preg_match is not made to parse HTML since it's not a regular expression. The best solution is to use an

XML Parser - PHP Doc

Each tool has its problem to solve and parsing is not preg_match's one

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.