Encoding problem between jQuery and Java

Question

My encoding is set to ISO-8859-1.

I'm making an AJAX call using jQuery.ajax to a servlet. The URL (after it has been serialized by jQuery) ends up looking like this:

https://myurl.com/countryAndProvinceCodeServlet?action=getProvinces&label=%C3%85land+Islands

The actual label value is Åland Islands. When this comes to the servlet, the value that I receive is:

Ã\u0085land Islands

But this is not what I want. I'd like it to get decoded to Åland Islands. I've tried many things (setting scriptCharset, trying to convert the string using getBytes(), but nothing seems to work).

look at this page and use the unescape function towards the bottom w3.org/International/O-URL-code.html — Romain Hippeau
– Romain Hippeau, Commented Oct 26, 2010 at 21:55

bobince · Accepted Answer · 2010-10-26 22:03:49Z

6

It is an unfortunate part of the Servlet specification that the encoding used to decode query parameters is not settable by servlets themselves. Instead it is left as a configuration matter for the server.

This makes deployment of internationalised web sites an enormous pain, especially because the default encoding chosen by the Servlet spec is not the most-likely-to-be-useful UTF-8, but ISO-8859-1. (Actual ISO-8859-1, not even Windows code page 1252, which is the encoding browsers will really submit when told to use ISO-8859-1!)

So how to reconfigure this is a server problem. For Tomcat, it requires some fiddling with the server.xml.

The alternative approach, if you don't have access to the server config, is to take each submitted parameter name/value and re-encode them. Luckily ISO-8859-1 preserves every byte submitted as a Unicode code point of the same number, so to convert the string as if it had been interpreted properly as UTF-8 in the first place, you can simply encode each String to a byte array using ISO-8859-1, and then decode the bytes back to a String using UTF-8. Of course if someone then re-configures the server to use UTF-8 you've got a problem...

answered Oct 26, 2010 at 22:03

bobince

538k111 gold badges675 silver badges846 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

BalusC Over a year ago

Note that the unability to configure the encoding for query parameters using the Servlet API only applies on GET query parameters (in URL), not on POST query parameters (in request body). Also note that the browser which actually sends CP1252 is only MSIE, not others. For the remnant, great answer as always :)

Vivin Paliath Over a year ago

Great answer. I also came to somewhat the same conclusion after reading wiki.apache.org/tomcat/FAQ/CharacterEncoding. I have updated my question as well -- it also looks like jQuery might be to blame (not url-encoding properly?)

bobince Over a year ago

@BalusC: actually all browsers have sent cp1252 for quite some time, even on non-Windows platforms. Some other ISO-8859-family encodings also mutate to their Windows equivalents. HTML5 is finally standardising this unfortunate wart. @Vivin: no, %C3%85 is the correct way to send a UTF-8-encoded Å; the JavaScript encodeURIComponent function used by jQuery always chooses UTF-8 because it's the only sensible encoding to use in a modern site. It's just a pity Servlet's default doesn't agree.

Vivin Paliath Over a year ago

Yes, just figured that out after some experimentation (escape vs encodeURIComponent)

bobince Over a year ago

Yeah, escape/unescape is a bit naughty and should usually be avoided. Apart from its use of ISO-8859-1 to URL-encode, and the non-standard handling of non-ISO-8859-1 characters, it fails to encode the + character, which can lead to unexpected spaces.

BalusC · Accepted Answer · 2010-10-26 22:29:07Z

4

Bobince already went into detail, so I'll skip that part. If you have really no control over the container managed URI encoding, your best bet is to take the URI encoding in your own hands. You can obtain the raw GET query string in servlets by HttpServletRequest#getQueryString(). Then it's a matter to split and URL-decode them using UTF-8 yourself using the usual String methods and URLDecoder#decode().

for (String parameter : request.getQueryString().split("&")) {
    String[] pair = parameter.split("=");
    String name = URLDecoder.decode(pair[0], "UTF-8");
    String value = URLDecoder.decode(pair[1], "UTF-8");
    // ...
}

Needless to say, keep in mind that this isn't a solution, but a workaround.

answered Oct 26, 2010 at 22:29

BalusC

1.1m377 gold badges3.7k silver badges3.6k bronze badges

1 Comment

bobince Over a year ago

+1 I've done this as a last resort before. You'll want to check that pair has the expected length, though, to avoid a non-a=b-format value in the query string causing an exception. (Ideally, splitting on only the first = may be a good idea too.)

Collectives™ on Stack Overflow

Encoding problem between jQuery and Java

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related