Regular Expression - get tables from html string in PHP

Question

I try to wrap all tables inside my content with a special div container, to make them usable for mobile. I can't wrap the tables, before they are saved within the database of the custom CSS. I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there.

I do this, to get all tables:

preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);

The problem is to get the inner part (?:.|\n)* to match everything that is inside the tags, without matching the ending tag. Right now the expression matches everything, even the ending tag of the table...

Is there a way to exclude the match for the ending tag?

"Is there a way to exclude the match for the ending tag?" - Use a HTML parser and not regex — exussum
– exussum, Commented Jul 31, 2014 at 8:14
You should use lazy match model,just try: preg_match_all('/(<table[^>]*>(?:.|\n)*?<\/table>)/', $aFile['sContent'], $aMatches); — Tim.Tang
– Tim.Tang, Commented Jul 31, 2014 at 8:14
First of all - you should not use regex when it is not needed. Second, have a read here: stackoverflow.com/questions/1732348/… and finally use hek2mgl answer — Talisin
– Talisin, Commented Jul 31, 2014 at 8:26
Possible duplicate of RegEx match open tags except XHTML self-contained tags — Brian Tompsett - 汤莱恩
– Brian Tompsett - 汤莱恩, Commented Jun 29, 2017 at 18:42

hek2mgl · Accepted Answer · 2014-07-31 08:28:33Z

9

You need to perform a non greedy match: /(<table[^>]*>(?:.|\n)*?<\/table>)/. Note the question mark: ?.

However, I would use a DOM parser for that:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told).

edited Jul 31, 2014 at 8:28

answered Jul 31, 2014 at 8:15

hek2mgl

159k31 gold badges263 silver badges279 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Talisin Over a year ago

+1 as avoiding regex for parsing HTML which is not a regular language and hence should not be parsed by regular expressions.

Jozze Over a year ago

Thank you! The non greedy match did the trick! My final regexp: /(?m)(<table[^>]*>(?:.|\n|\r)*?<\/table>)/ I'm not that familiar with the DOM parser, but i'll try to implement this version. If i get it right, i'll use this instead. Thanks a lot :)

hek2mgl Over a year ago

You are welcome. Just copy the code I've posted. The example aims to be working code.

Jozze Over a year ago

Doesn't work for me... at least for now. There seem to be some namespace errors. It can't find DOMDocument() ... maybe the php extension is not installed or something like that. But the regex works for now and i'll try to change it again, when our senior developer comes back. I'll try to remember to post the result here, when it's done. Thanks again!

hek2mgl Over a year ago

@Jozze If you are working in a namespace you need to use \DOMDocument .. Note the `\` which is addressing the global PHP namespace.

Avinash Raj · Accepted Answer · 2014-07-31 08:14:49Z

0

You could use lookahead if you don't want to match the end tag,

preg_match_all('/(<table[^>]*>(?:.|\n)*(?=<\/table>))/', $aFile['sContent'], $aMatches);

answered Jul 31, 2014 at 8:14

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

Collectives™ on Stack Overflow

Regular Expression - get tables from html string in PHP

2 Answers 2

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related