3

Is it possible to construct a String in java from invalid code points?

Is there any way a String str.getBytes("utf8") in java can return an invalid utf8 encoding?

The context is that I want to be able to serialize a String using an utf8 encoding as an array of bytes, and want to be able to deserialize it into as the same String.

I want to determine whether or not my (de)serialization code should first check if the array of bytes is a valid utf8 encoding or not.

Thank you.

1
  • 1
    Consider a string consisting only of a low surrogate. Commented Nov 1, 2013 at 0:50

1 Answer 1

2

You can use the CharsetEncoder and CharsetDecoder classes in java.nio.charset to achieve precise control over how characters and bytes are translated back and forth. In particular, CharsetDecoder.onMalformedInput() and CharsetDecoder.onUnmappableCharacter() let you define how those conditions should be handled. (The behaviour of the String constructor that takes a byte[] is undefined in these cases.)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.