A php regex to remove white spaces in html [duplicate]

Question

Hi I have a html like

<html>
   <head>
     <title>
          Some title
   </title>
</head>
<body>
    <div id="one">         some sample info </div>
</body>
</html>

How can I remove white spaces in this html except those in contents and within the tags using some regex using preg_replace? so to get something like this

<html><head><title>Some title</title></head><body><div id="one">some sample info</div></body></html>

please can anyone help me with this?? :)

what if there is <pre> elements?

Gordon
– Gordon

2012-02-01 12:09:32 +00:00
Commented Feb 1, 2012 at 12:09 — Gordon
– Gordon, Commented Feb 1, 2012 at 12:09

Sufian Latif · Accepted Answer · 2012-05-16 06:11:18Z

5

You can replace (?<=>)\s+(?=<)|(?<=>)\s+(?!=<)|(?!<=>)\s+(?=<) with empty strings.

Edit: There's a simpler form: replace (?<=>)\s+|\s+(?=<)

Simply spoken, this regex will replace a group of one or more whitespaces if it has a > to the left or a < to the right.

It actually has two parts joined by OR (symbol: |), so either one may match:

(?<=>)\s+ - this will match one or more whitespaces (\s+ in the regex), if it is preceded by a < (in regex: (?<=>)).
\s+(?!=<) - this will match one or more whitespaces if it is followed by a < (in regex: (?!=<))

Learn more about regex.

edited May 16, 2012 at 6:11

answered Feb 1, 2012 at 12:10

Sufian Latif

13.4k3 gold badges36 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mickmackusa Over a year ago

This answer is completely unstable and relies on the notion that there are no lingering > or < symbols in any of the textnodes in the html document. I would not recommend this technique to anyone. This is just another case where using regex to do a DOM parser's job is inappropriate. Researchers, please be informed that regex is "DOM-ignorant" -- it doesn't know if it is matching the start/end of a tag or merely something that resembles the start/end of a tag. At the very least, this regex is too primitive to do a consistently good job.

Collectives™ on Stack Overflow

A php regex to remove white spaces in html [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related