Unicode strings from Java to Javascript via JSON

Question

In a Java servlet I'm doing:

protected void handleRequests(HttpServletRequest request, HttpServletResponse response)

  PrintWriter pw = response.getWriter();

  /*...*/

  Vector<String> buf = new Vector<>();
  for(...) {
    ret.add(">žd¿ [?²„·ÜðÈ ‘");
  }

  /*JSONArray*/ responseArray.put(responseArray.length(), buf); 


  /*...*/

  pw.println(responseArray);

  pw.close();
}

In a web page client javascript I'm doing a XMLHttpRequest and the reply is incorrect, looks like: >?d¿ [\u001a?²\u201e·ÜðÈ \u2018

(for the above >žd¿ [?²„·ÜðÈ ‘ input)

Then I tried on the servlet:

ret.add(URLEncoder.encode(">žd¿ [?²„·ÜðÈ ‘", "UTF-8"));

and I get:

%3E%C5%BEd%C2%BF%C2%A0%5B%7F%1A%3F%C2%B2%E2%80%9E%C2%B7%C3%9C%C3%B0%C3%88%C2%A0%E2%80%98

in javascript, then I apply:

unescape(reply.replace(/\+/g,' ') (the replace is because + signs are not converted to spaces)

which nets me:

>Å¾dÂ¿Â [?Â²â??Â·Ã?Ã°Ã?Â â

What do I do wrong?

(Some other questions tells me the servlet should send as utf8. But when do I encode in utf8 - before placing inside a JSON object (I use org.json.) or after (with a .toString on the JSON response array and then convert to utf8 before PrintWriter.println)

P.S. This is not all my code, I've inherited it and some of the theoretical background I'm lacking.

Edit: doing a decodeURIComponent(reply).replace(/\+/g,' ') in javascript seems to do the trick. But I could not find the difference between URLEncoder.encode and decodeURIComponent. Is the +/space the only mismatch?

You shouldn't have to URL encode the string at all. You're not using it in a URL, right? — Pointy
– Pointy, Commented Jul 24, 2014 at 15:07
No, I'm displaying it only. If I don't URL encode it, I get, as shown >?d¿ [\u001a?²\u201e·ÜðÈ \u2018. After a JSON.parse I get >?d¿ [?²„·ÜðÈ ‘ which is close but not quite... — Adrian 3873
– Adrian 3873, Commented Jul 24, 2014 at 15:16
Make sure that your HTTP response has the right "Content-Type" header too - it has to include "charset=UTF-8" — Pointy
– Pointy, Commented Jul 24, 2014 at 15:22
@Pointy response.setCharacterEncoding( "UTF-8" ); did the trick. Thanks! If you add as reply I'll accept it. — Adrian 3873
– Adrian 3873, Commented Jul 24, 2014 at 15:31
Ha ha I'm on a roll today; this is the second UTF-8 issue that's come up here :) — Pointy
– Pointy, Commented Jul 24, 2014 at 15:32

user2033671 · Accepted Answer · 2014-07-24 15:04:47Z

1

decodeURIComponent nets the expected result

decodeURIComponent("%3E%C5%BEd%C2%BF%C2%A0%5B%7F%1A%3F%C2%B2%E2%80%9E%C2%B7%C3%9C%C3%B0%C3%88%C2%A0%E2%80%98");
">žd¿ [?²„·ÜðÈ ‘"

answered Jul 24, 2014 at 15:04

user2033671

Sign up to request clarification or add additional context in comments.

1 Comment

Adrian 3873 Over a year ago

Seems so. But I still need to do: .replace(/\+/g,' ') afterwards. decodeURIComponent does not perfectly decode output from URLEncoder.encode. But are there other diferences besides +/space?

Collectives™ on Stack Overflow

Unicode strings from Java to Javascript via JSON

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related