12

Does anyone know any good java library (or single method) that can strip extra spaces (line breaks, tabs, etc) from an html file? So html file gets turned into 1 line basically.

Thanks.

UPDATE: Looks like there is no library that does that so I created my own open source project for solving this task: http://code.google.com/p/htmlcompressor/

5 Answers 5

25

Looks like there is no library that does that so I created my own open source project for solving this task, maybe someone will find it helpful: http://code.google.com/p/htmlcompressor/

Sign up to request clarification or add additional context in comments.

Comments

6

Personally, I just enabled HTTP compression in the server and I leave my HTML readable.

But for what you want, you could just use String.replaceAll() with a regex that matching what you have specified. Off the top of my head, something like:

small=large.replaceAll("\\s{2,}"," ");

8 Comments

The only problem is that if you have a string that contains spaces, then those spaces will be erased as well. Also it will break alot of HTML formatting just for example "<table border=1.." would turn out as "<tableborder=1.." HTML parser will choke on that. :P
@Suroot no, it's fine. It replaces multiple spaces with just one.
@ sblundy but "Hello World" will become "Hello World" which isn't what you want if "Hello World" is what is supposed to be displayed.
@Suroot Browsers convert multiple spaces to a single space. For example, your two "Hello Worlds" look the same. If you want multiple spaces, you need to use @nbsp;.
Of course, if you rely on multiple spaces for formatting inside a <pre> tag, this will be fubared.
|
3

Be careful with that. Text inside pre and textarea elements will be damaged. In addition, inlined javascript inside script elements will have to be ended with column;. Lastly if you code inlined javascript with html comments (to avoid some old browser buggy behavior) this will eventually comment out the whole inlined javascript code.

Why do you want to do that? If you want to decrease the download size of the html then all you need is a GZIP filter.

Comments

1

Assuming the desire is to make the HTML smaller to optimize the bytes sent over the network why not have the HTTP server do the work? Read here.

Will this work? Not free unfortunately.

3 Comments

Already using it. I still would like to have a compression though.
Does it have to be Java? DoOes it have to be free?
There's no point at all in whitespace collapsing your HTML if you are applying HTTP compression - the end result will be so close as to not matter for the size of data across the wire. WS collapsing just adds another pre-deployment step.
0
input.replaceAll("\s+", " ");

will convert any whitespace into a single space

2 Comments

but it will also replace any single space with a single space, won't it? Which is wasted cycles.
Of course, if you rely on multiple spaces for formatting inside a <pre> tag, this will be fubared.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.