1

I have to pass unicode string to a JSONObject.

JSONObject json = new JSONObject("{\"One\":\"\\ud83c\\udf45\\ud83c\\udf46\"}");
json.put("Two", "\ud83c\udf45\ud83c\udf46");
System.out.println(json.toString());

but I have this:

{"One":"🍅🍆","Two":"🍅🍆"}

I want this:

{"One":"\ud83c\udf45\ud83c\udf46","Two":"\ud83c\udf45\ud83c\udf46"}
5
  • Have you tried escaping your string? json.put("Two", "\\ud83c\\udf45\\ud83c\\udf46"); Commented Jun 8, 2015 at 6:52
  • I have this: {"One":"🍅🍆","Two":"\\ud83c\\udf45\\ud83c\\udf46"} Commented Jun 8, 2015 at 6:57
  • @LutzHorn: read the JSON spec, Section 9... Commented Jun 8, 2015 at 21:11
  • @LutzHorn: If the code point is in the Basic Multilingual Plane (U+0000 through U+FFFF), then it may be represented as a six-character sequence: a reverse solidus, followed by the lowercase letter u, followed by four hexadecimal digits that encode the code point... To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E". Commented Jun 8, 2015 at 21:14
  • 1
    @LutzHorn: and my comments were to tell you that they ARE valid Unicode characters. They are encoded as UTF-16 surrogate pairs, per JSON spec section 9 that I quoted. \ud83c\udf45 represents U+1F345 TOMATO and \ud83c\udf46 represents U+1F346 AUBERGINE. Commented Jun 9, 2015 at 14:40

2 Answers 2

3

The system is working as designed. You are just not taking into account that JSON does not require most Unicode characters to be formatted in \uXXXX format. Certain escape characters must be in \X format, and control characters <= 0x1F must be in \uXXXX format, but any other character may be in \uXXXX format but is not required to be. The characters you have shown do not fall into those ranges, which is why toString() is not encoding them in \uXXXX format.

When you call new JSONObject(String), it decodes the input string into actual Unicode strings, as if you had done this instead:

JSONObject json = new JSONObject();
json.put("One", "\ud83c\udf45\ud83c\udf46");

Which is perfectly fine. You want the JSONObject to hold un-escaped Unicode data internally.

Where you are getting tripped up is the fact that JSONObject.toString() is not formatting your particular Unicode characters in \uXXXX format. That is perfectly valid JSON, but is not how you are wanting them to be formatted (why do you want them formatted this way?).

A look at the source for Java's JSONStringer class (which implements JSONObject.toString()) reveals that it only formats non-reserved control characters <= 0x1F in \uXXXX format, other non-reserved characters are formatted as-is. This conforms to the JSON specification.

To do what you are asking for, you will have to manually format Unicode characters as needed after calling JSONObject.toString() to format reserved and ASCII characters normally, eg:

JSONObject json = new JSONObject("{\"One\":\"\\ud83c\\udf45\\ud83c\\udf46\"}");
// decodes as if json.put("One", "\ud83c\udf45\ud83c\udf46")
// or json.put("One", "🍅🍆") were called directly ...

json.put("Two", "\ud83c\udf45\ud83c\udf46");
// same as calling json.put("Two", "🍅🍆") ...

String s = json.toString();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); ++i)
{
    char ch = s.charAt(i);
    if (ch >= 0x7F)
        sb.append(String.format("\\u%04x", (int) ch));
    else
        sb.append(ch);
}

System.out.println(sb.toString());
// outputs '{"One":"\ud83c\udf45\ud83c\udf46","Two":"\ud83c\udf45\ud83c\udf46"}' as expected ...
Sign up to request clarification or add additional context in comments.

Comments

-1

One way of doing this is:

json.put("Two", "\\u" + "d83c" + "\\u" + "df45" + ...);

This will print the string literal \ud83c\udf45 when you try to print the JSON.

2 Comments

That is no different than using json.put("Two", "\\ud83c\\udf45...");, as the concatenation occurs before put() is called.
This is telling the library to insert the literal value \ud83c where the \ must be escaped in JSON.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.