1

What would the JavaScript regex be to minify contents of HTML. Note that I only want to remove spaces that are >2 and nothing below.

I also want to replace single quotation marks ' ' with double " "

This is what I got so far, although I'm guessing there's a more efficient way of doing this:

var findSpaces = content.match(' ') >= 2;
var findQuotes = content.match(" ' ");

content.replace(findSpaces, "" );

content.replace(findQuotes, ' " ' );

No jQuery please

3
  • I don't think this is a situation where you can "roll your own" and expect to get it right without spending hours and hours... If you KNOW you will ALWAYS operate ONLY on trivially simple HTML, then you might have a chance... Commented Apr 25, 2014 at 5:42
  • Replacing single quotation marks with double quotation marks will break code where double quotation marks are contained in a string. Removing all extra white space without regard to if the space is inside of quotes can also break code. Commented Aug 19, 2014 at 5:30
  • Related - stackoverflow.com/q/44841365/104380 Commented Dec 16, 2019 at 12:52

2 Answers 2

1

In the below example all new lines \r\n or spaces between HTML tags are removed, and on the second phase the content within HTML tags is minified, so extra spaces are eliminated.

Finally trim() is used to remove spaces before & after the final resulting string.

// dummy string to minify
var s = `

    <div   value="a"     class="a b"   id="a">
      <div>
        foo   bar  
        <br><br>
        <span>baz</span>   <i>a</i>  
      </div>
    </div>
`

function minify( s ){
  return s
    .replace(/\>[\r\n ]+\</g, "><")
    .replace(/(<.*?>)|\s+/g, (m, $1) => $1 ? $1 : ' ')
    .trim()
}

console.log(  minify(s)  )

The above is also available as a gist in my collection

Sign up to request clarification or add additional context in comments.

5 Comments

I'm not sure that removing white spaces between phrasing content elements (aka "inline elements" prior to HTML5) is what is usually needed from code minification. It can significantly change the meaning of the content.
I do it in all my projects where I have html templates to inject into the DOM, and it's actually a must-do. spaces between tags might mess up layout and interfere with CSS, and removing those spaces never caused me any harm, when you know on what to apply this to of course
so you noticed the potential problem in my comment above, right? ;) BTW, I strongly believe that spaces never interfere with CSS, it's (suboptimal) CSS that might interfere with them and mess up the layout (like using inline-blocks for horizontal arrangement of blocks, that Flexbox is designed for:)
Just an example out of many when minifying content is helpful. This question is one of the most popular on this website. Another popular example here. People keep asking this and getting stuck on such things without knowing the importance of removing spaces between elements that shouldn't be there
Yes, this is the most notorious example of applying the CSS mechanism where spaces are meaningful (inline formatting) to the task where it shouldn't (horizontal layout of blocks) unsurprisingly giving unwanted results. So the correct question is "How to make the layout not depending on source formatting?" and the correct answer is using the true layout mechanism (most likely Flexbox) instead of faking it with wrong means. In 2011 this question made sense, but now it should become history. And the silver bullet illusion that auto-removing spaces might give could cause other problems.
0

var s = `

    <div   value="a"     class="a b"   id="a">
      <div>
        foo bar  
        <br><br>
        <span>baz</span>   <i>a</i>  
      </div>
    </div>
`

console.log(
  s.replace(/\s{2,}/g, ' ').replace(/\'/g, '"')
)

should do the job for you

5 Comments

What if there are escaped single quotes? What if there are single quotes that aren't used in HTML attributes? What if there are preformatted pieces of content that contain multiple spaces and need to be preserved?
then also it will match the single quote , do you not want to match the escaped single quotes ?
The OP has made little effort to define his actual requirements, but he did mention that he wants to "minify HTML", which would indicate that all these possibilities need to be considered.
This works pretty well. Although what about when I'm placing "\s{2,}" inside a variable to use later? i.e var = "\s{2,}/g"
It's not good enough. As you can see from the example, if an HTML string has attributes with more than a single space between them, the whole string will become invalid because all the spaces will be removed

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.