3

My encoding is set to ISO-8859-1.

I'm making an AJAX call using jQuery.ajax to a servlet. The URL (after it has been serialized by jQuery) ends up looking like this:

https://myurl.com/countryAndProvinceCodeServlet?action=getProvinces&label=%C3%85land+Islands

The actual label value is Åland Islands. When this comes to the servlet, the value that I receive is:

Ã\u0085land Islands

But this is not what I want. I'd like it to get decoded to Åland Islands. I've tried many things (setting scriptCharset, trying to convert the string using getBytes(), but nothing seems to work).

3
  • Is there a reason youre not using UTF-8 through out? Commented Oct 26, 2010 at 21:54
  • look at this page and use the unescape function towards the bottom w3.org/International/O-URL-code.html Commented Oct 26, 2010 at 21:55
  • @prodigitalson Yes, it's beyond my control unfortunately :( Commented Oct 26, 2010 at 21:56

2 Answers 2

6

It is an unfortunate part of the Servlet specification that the encoding used to decode query parameters is not settable by servlets themselves. Instead it is left as a configuration matter for the server.

This makes deployment of internationalised web sites an enormous pain, especially because the default encoding chosen by the Servlet spec is not the most-likely-to-be-useful UTF-8, but ISO-8859-1. (Actual ISO-8859-1, not even Windows code page 1252, which is the encoding browsers will really submit when told to use ISO-8859-1!)

So how to reconfigure this is a server problem. For Tomcat, it requires some fiddling with the server.xml.

The alternative approach, if you don't have access to the server config, is to take each submitted parameter name/value and re-encode them. Luckily ISO-8859-1 preserves every byte submitted as a Unicode code point of the same number, so to convert the string as if it had been interpreted properly as UTF-8 in the first place, you can simply encode each String to a byte array using ISO-8859-1, and then decode the bytes back to a String using UTF-8. Of course if someone then re-configures the server to use UTF-8 you've got a problem...

Sign up to request clarification or add additional context in comments.

5 Comments

Note that the unability to configure the encoding for query parameters using the Servlet API only applies on GET query parameters (in URL), not on POST query parameters (in request body). Also note that the browser which actually sends CP1252 is only MSIE, not others. For the remnant, great answer as always :)
Great answer. I also came to somewhat the same conclusion after reading wiki.apache.org/tomcat/FAQ/CharacterEncoding. I have updated my question as well -- it also looks like jQuery might be to blame (not url-encoding properly?)
@BalusC: actually all browsers have sent cp1252 for quite some time, even on non-Windows platforms. Some other ISO-8859-family encodings also mutate to their Windows equivalents. HTML5 is finally standardising this unfortunate wart. @Vivin: no, %C3%85 is the correct way to send a UTF-8-encoded Å; the JavaScript encodeURIComponent function used by jQuery always chooses UTF-8 because it's the only sensible encoding to use in a modern site. It's just a pity Servlet's default doesn't agree.
Yes, just figured that out after some experimentation (escape vs encodeURIComponent)
Yeah, escape/unescape is a bit naughty and should usually be avoided. Apart from its use of ISO-8859-1 to URL-encode, and the non-standard handling of non-ISO-8859-1 characters, it fails to encode the + character, which can lead to unexpected spaces.
4

Bobince already went into detail, so I'll skip that part. If you have really no control over the container managed URI encoding, your best bet is to take the URI encoding in your own hands. You can obtain the raw GET query string in servlets by HttpServletRequest#getQueryString(). Then it's a matter to split and URL-decode them using UTF-8 yourself using the usual String methods and URLDecoder#decode().

for (String parameter : request.getQueryString().split("&")) {
    String[] pair = parameter.split("=");
    String name = URLDecoder.decode(pair[0], "UTF-8");
    String value = URLDecoder.decode(pair[1], "UTF-8");
    // ...
}

Needless to say, keep in mind that this isn't a solution, but a workaround.

1 Comment

+1 I've done this as a last resort before. You'll want to check that pair has the expected length, though, to avoid a non-a=b-format value in the query string causing an exception. (Ideally, splitting on only the first = may be a good idea too.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.