5

I try to wrap all tables inside my content with a special div container, to make them usable for mobile. I can't wrap the tables, before they are saved within the database of the custom CSS. I managed to get to the content, before it's printed on the page and I need to preg_replace all the tables there.

I do this, to get all tables:

preg_match_all('/(<table[^>]*>(?:.|\n)*<\/table>)/', $aFile['sContent'], $aMatches);

The problem is to get the inner part (?:.|\n)* to match everything that is inside the tags, without matching the ending tag. Right now the expression matches everything, even the ending tag of the table...

Is there a way to exclude the match for the ending tag?

4
  • 1
    "Is there a way to exclude the match for the ending tag?" - Use a HTML parser and not regex Commented Jul 31, 2014 at 8:14
  • 1
    You should use lazy match model,just try: preg_match_all('/(<table[^>]*>(?:.|\n)*?<\/table>)/', $aFile['sContent'], $aMatches); Commented Jul 31, 2014 at 8:14
  • 1
    First of all - you should not use regex when it is not needed. Second, have a read here: stackoverflow.com/questions/1732348/… and finally use hek2mgl answer Commented Jul 31, 2014 at 8:26
  • Possible duplicate of RegEx match open tags except XHTML self-contained tags Commented Jun 29, 2017 at 18:42

2 Answers 2

9

You need to perform a non greedy match: /(<table[^>]*>(?:.|\n)*?<\/table>)/. Note the question mark: ?.

However, I would use a DOM parser for that:

$doc = new DOMDocument();
$doc->loadHTML($html);

$tables = $doc->getElementsByTagName('table');
foreach($tables as $table) {
    $content = $doc->saveHTML($table); 
}

While it is already more convenient to use a DOM parser for extracting data from HTML documents, it is definitely the better solution if you are attempting to modify the HTML (as you told).

Sign up to request clarification or add additional context in comments.

5 Comments

+1 as avoiding regex for parsing HTML which is not a regular language and hence should not be parsed by regular expressions.
Thank you! The non greedy match did the trick! My final regexp: /(?m)(<table[^>]*>(?:.|\n|\r)*?<\/table>)/ I'm not that familiar with the DOM parser, but i'll try to implement this version. If i get it right, i'll use this instead. Thanks a lot :)
You are welcome. Just copy the code I've posted. The example aims to be working code.
Doesn't work for me... at least for now. There seem to be some namespace errors. It can't find DOMDocument() ... maybe the php extension is not installed or something like that. But the regex works for now and i'll try to change it again, when our senior developer comes back. I'll try to remember to post the result here, when it's done. Thanks again!
@Jozze If you are working in a namespace you need to use \DOMDocument .. Note the `\` which is addressing the global PHP namespace.
0

You could use lookahead if you don't want to match the end tag,

preg_match_all('/(<table[^>]*>(?:.|\n)*(?=<\/table>))/', $aFile['sContent'], $aMatches);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.