PHP preg_replace RegEx remove empty paragraph tags

Question

The regex solution suggested on PHP RegEx remove empty paragraph tags
#(\s| |</?\s?br\s?/?>)*</?p>#

fail on my example-string:
 <div align="justify">Some Text</div>

and I can't figure out why.
See Live Regex here http://www.phpliveregex.com/p/6ID

Using regex to manipulate DOM elements is not really a good idea, you should use a DOM parser.simplehtmldom.sourceforge.net — Jay Blanchard
– Jay Blanchard, Commented Sep 5, 2014 at 12:57
The # on either end of that regex is a delimiter. PHP Live regex forces the delimiters to be /, which breaks the /?s in the pattern and makes the #s be interpreted as regular characters. As others have posted, this works fine in PHP itself. — Dan
– Dan, Commented Sep 5, 2014 at 12:58
@Avinash: you added the gm modifier there to make it work right? — Tom Senner
– Tom Senner, Commented Sep 5, 2014 at 13:02

Elias Van Ootegem · Accepted Answer · 2014-09-05 13:07:55Z

3

You really shouldn't set about modifying a DOM using regex. There are DOM parsers to do this kind of thing. It's not even that hard:

$html = '<p><br></p><div align="justify"><b>Some Text</b></div>
<p>foobar</p>
<p></p>';//empty
$dom = new DOMDocument;
$dom->loadHTML($html);
$pars = $dom->getElementsByTagName('p');
foreach ($pars as $tag)
{
    if (!trim($tag->textContent))
    {
        $tag->parentNode->removeChild($tag);
    }
}

That's all. You simply select all of the p tags, then check if its trim-ed text contents is empty, if it is: remove the node by selecting its parent, and invoking the DOMNode::removeChild method...
The snippet above removes 2 of the 3 paragraph nodes, the one containing foorbar is left as is. I thinkg that's what you are trying to do...

To get the actual dom fragment, after removing the tags that needed to be removed, you can simply do this:

echo trim(
    substr(
        $dom->saveHTML($dom->documentElement),//omit doctype
        12, -14//12 => <html><body> and -14 for </body></html>
    )
);

proof of concept

edited Sep 5, 2014 at 13:07

answered Sep 5, 2014 at 13:02

Elias Van Ootegem

76.7k10 gold badges123 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

u01jmg3 Over a year ago

Thanks for this - think it needs to be tweaked slightly to work with multiple empty paragraphs though. e.g. 

Elias Van Ootegem Over a year ago

@u01jmg3: Have you tested it with that input? The example you give (2 empty paragraph tags) should work fine. The code should pick up on both the nodes, and see that there empty (and remove them)

Paul · Accepted Answer · 2014-09-05 12:59:46Z

-1

In your Live Regex example you were using double separators, see http://www.phpliveregex.com/p/6II for a working example. Also, since the pre-defined separator is / you need to escape the slashes in code (also in example).

EDIT: In general though, it's best to follow Jay's suggestion and not use regex for this kind of tasks.

answered Sep 5, 2014 at 12:59

Paul

9,0523 gold badges30 silver badges48 bronze badges

Collectives™ on Stack Overflow

PHP preg_replace RegEx remove empty paragraph tags

2 Answers 2

proof of concept

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related