1

for some reason, I have to decode string in chinese character. like this “\u961c”,this string is utf-8 of “阜”

I know how to decode bytes[] into Unicode characters.but is there an easy way decode String into Unicode characters?

By the way,When I get “阜”.getBytes. I get -100,-104,-23. Is that means

1001110 10010100 11101001 in binary?

But I think \u961c Unicode should be 1001 0110 0001 1100 in binary or something

and it's utf-8 format should be 11101001 10011000 10011100 in binary

1
  • 阜 (U+961C) is \u961C in UTF-16 but E9 98 9C in UTF-8 Commented Mar 11, 2016 at 3:23

1 Answer 1

1

In Java, there is no such method to encode a String object (not entirely accurate, there is an encoding, but that's UTF-16).

The only way is to encode to a byte[]. So if you need UTF-8 data, then you need a byte[]. If you have a String that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String (i.e. it was using the wrong encoding).

This one will work, but for bytes[]

Charset.forName("UTF-8").encode(myString)
Sign up to request clarification or add additional context in comments.

1 Comment

Charset.encode() returns a ByteBuffer. To get a byte[] from that, you would have call Charset.forName("UTF-8").encode(myString).array(). Otherwise, use myString.getBytes("UTF-8") or myString.getBytes(StandardCharsets.UTF_8) instead.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.