1

I am attempting to capture all column contents within HTML tables. I'm very close, but my regex is only capturing the first column of each table. What do I need to do to capture all of the columns?

Here is my regex and HTML: https://regex101.com/r/jA3sS6/1

3
  • 2
    Any reason for not using PHP DOMDocument? Commented Mar 30, 2016 at 19:35
  • stackoverflow.com/questions/1732348/… Build a state machine (or use frz3993's method. It's probably a state machine under the hood) Commented Mar 30, 2016 at 19:36
  • Wow, I wish I'd known about regex101.com a long time ago. Commented Mar 30, 2016 at 20:57

1 Answer 1

1

Don't use regular expression, use a Parser instead!

Start with this:

$dom = new DOMDocument();
libxml_use_internal_errors(1);
$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );

To retrieve all <td>:

foreach( $dom->GetElementsByTagName( 'td' ) as $td )
{
    echo $td->nodeValue . PHP_EOL;
}

To retrieve all <td class="large-text">:

foreach( $xpath->query( '//td[@class="large-text"]' ) as $td )
{
    echo $td->nodeValue . PHP_EOL;
}

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.