7

I'm trying to remove all empty <p> tags CKEditor is inserting in to a description box but they all seem to vary. The possibilities seem to be:

<p></p>

<p>(WHITESPACE)</p>

<p>&nbsp;</p>

<p><br /></p>

<p>(NEWLINE)&nbsp;</p>

<p>(NEWLINE)<br /><br />(NEWLINE)&nbsp;</p>

With these possibilities, there could be any amount of whitespace, &nbsp; and <br /> tags in between the paragraphs, and there could be some of each kind in one paragraph.

I'm also not sure about the <br /> tag, from what I've seen it could be <br />, <br/> or <br>.

I've searched SO for a similar answer but of all the answers I've seen they all seem to cater for just one of these cases, not all at once. I guess in simple terms what I'm asking is, Is there a regular expression I can use to remove all <p> tags from some HTML that don't have any alphanumeric text or symbols/punctuation in them?

7
  • 5
    And this is why you don't Parse HTML with Regexes. Commented Jan 10, 2013 at 15:06
  • Don't use regular expressions to parse HTML. You cannot reliably parse HTML with regular expressions. As soon as the HTML changes from your expectations, your code will be broken. See htmlparsing.com/php.html for examples of how to properly parse HTML with PHP modules. Commented Jan 10, 2013 at 15:14
  • 2
    So you really think I should use an HTML parser for a string such as '<p>Text</p><p>&nbsp;</p>' - Seems like overkill don't you think? Commented Jan 10, 2013 at 15:17
  • 2
    This isn't parsing, techincally. And if the desired effect is suitably narrow (i.e. if you expect no understanding, just pattern matching), there's nothing wrong with Regexing a string. Cthulu will stay in his box. Commented Jan 10, 2013 at 15:18
  • 1
    I'm curious as to why @AndyLester thinks using DOMDocument to parse a 24 character HTML string is a good idea Commented Jan 10, 2013 at 15:23

2 Answers 2

17

Well, in conflict with my suggestion not to parse HTML with regexes, I wrote up a regex to do just that:

"#<p>(\s|&nbsp;|</?\s?br\s?/?>)*</?p>#"

This will match properly for:

<p></p>

<p> </p> <!-- ([space]) -->

<p> </p> <!-- (That's a [tab] character in there -->

<p>&nbsp;</p>

<p><br /></p>

<p>
&nbsp;</p>

<p>
<br /><br />
&nbsp;</p>

What it does:

# /                --> Regex start
# <p>              --> match the opening <p> tag
# (                --> group open.
#   \s             --> match any whitespace character (newline, space, tab)
# |                --> or
#   &nbsp;         --> match &nbsp;
# |                --> or
#   </?\s?br\s?/?> --> match the <br> tag
# )*               --> group close, match any number of any of the elements in the group
# </?p>            --> match the closing </p> tag ("/" optional)
# /                --> regex end.
Sign up to request clarification or add additional context in comments.

6 Comments

Two things: use a different deilimiter, your regex will break like crazy because you forgot to escape the forwardslashes. Additionally, the examples in the post don't seem to have terminating tags. I wrote the regex slightly differently: #<p>(\s+|&nbsp;|<br\s*/?>)*(</p>)(?=<p>)# Of course, you can pepper in \s* for all sorts of whitespace concerns.
@FrankieTheKneeMan: the examples in the posts seem to use <p> as terminating tags. I'll make the / in </p> optional.Other than that, thanks for the suggestions. I made such a "mess" of the <br> to catch all "possibilities" I've seen used.
Thanks a lot, it did't work with the / as the delimiter but worked perfectly once changed to #
Ah, tagged php. Was used to the JS syntax, but somehow I forgot to escape the /'s in there -.- Thanks for accepting my answer!
@Cerbrus I forgot all about code blocks, how stupid of me. Cheers :)
|
5

The selected answer is great, but it doesn't work if <p> tag has inline style attributes defined, like <p style="font-weight:bold">.

A regex to match this, would be:

#<p[^>]*>(\s|&nbsp;|</?\s?br\s?/?>)*</?p>#

1 Comment

Good answer, I just tried this with all the test cases in my original question plus <p style="text-align:left"></p> and it works well

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.