0

The regex solution suggested on PHP RegEx remove empty paragraph tags
#<p>(\s|&nbsp;|</?\s?br\s?/?>)*</?p>#

fail on my example-string:
<p><br></p><div align="justify"><b>Some Text</b></div><p></p>

and I can't figure out why.
See Live Regex here http://www.phpliveregex.com/p/6ID

5
  • Works fine for me. preg_replace($re, '', $str); Commented Sep 5, 2014 at 12:54
  • me also regex101.com/r/qW4dI6/3 Commented Sep 5, 2014 at 12:55
  • 4
    Using regex to manipulate DOM elements is not really a good idea, you should use a DOM parser.simplehtmldom.sourceforge.net Commented Sep 5, 2014 at 12:57
  • The # on either end of that regex is a delimiter. PHP Live regex forces the delimiters to be /, which breaks the /?s in the pattern and makes the #s be interpreted as regular characters. As others have posted, this works fine in PHP itself. Commented Sep 5, 2014 at 12:58
  • @Avinash: you added the gm modifier there to make it work right? Commented Sep 5, 2014 at 13:02

2 Answers 2

3

You really shouldn't set about modifying a DOM using regex. There are DOM parsers to do this kind of thing. It's not even that hard:

$html = '<p><br></p><div align="justify"><b>Some Text</b></div>
<p>foobar</p>
<p></p>';//empty
$dom = new DOMDocument;
$dom->loadHTML($html);
$pars = $dom->getElementsByTagName('p');
foreach ($pars as $tag)
{
    if (!trim($tag->textContent))
    {
        $tag->parentNode->removeChild($tag);
    }
}

That's all. You simply select all of the p tags, then check if its trim-ed text contents is empty, if it is: remove the node by selecting its parent, and invoking the DOMNode::removeChild method...
The snippet above removes 2 of the 3 paragraph nodes, the one containing foorbar is left as is. I thinkg that's what you are trying to do...

To get the actual dom fragment, after removing the tags that needed to be removed, you can simply do this:

echo trim(
    substr(
        $dom->saveHTML($dom->documentElement),//omit doctype
        12, -14//12 => <html><body> and -14 for </body></html>
    )
);

proof of concept

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this - think it needs to be tweaked slightly to work with multiple empty paragraphs though. e.g. <p></p><p></p>
@u01jmg3: Have you tested it with that input? The example you give (2 empty paragraph tags) should work fine. The code should pick up on both the nodes, and see that there empty (and remove them)
-1

In your Live Regex example you were using double separators, see http://www.phpliveregex.com/p/6II for a working example. Also, since the pre-defined separator is / you need to escape the slashes in code (also in example).

EDIT: In general though, it's best to follow Jay's suggestion and not use regex for this kind of tasks.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.