10

Is there a way to change the encoding used by the String(byte[]) constructor ?

In my own code I use String(byte[],String) to specify the encoding but I am using an external library that I cannot change.

String src = "with accents: é à";
byte[] bytes = src.getBytes("UTF-8");
System.out.println("UTF-8 decoded: "+new String(bytes,"UTF-8"));
System.out.println("Default decoded: "+new String(bytes));

The output for this is :

UTF-8 decoded: with accents: é à
Default decoded: with accents: é à

I have tried changing the system property file.encoding but it does not work.

3 Answers 3

7

You need to change the locale before launching the JVM; see:

Java, bug ID 4163515

Some places seem to imply you can do this by setting the file.encoding variable when launching the JVM, such as

java -Dfile.encoding=UTF-8 ...

...but I haven't tried this myself. The safest way is to set an environment variable in the operating system.

Sign up to request clarification or add additional context in comments.

5 Comments

Has anyone tried the -Dfile.encoding approach? It would be great to be able to do this in a platform-agnostic way.
@MattPassell We use the following args when launching the JVM to ensure that we're specifying UTF-8 properly everywhere: -Dfile.encoding=ISO646-US -Dsun.jnu.encoding=ISO646-US and it appears to work fine.
Thanks for the response. Am I missing something? I just Googled for ISO646-US and found out it's an official name for ASCII. How does that help make sure you're using UTF-8?
@MattPassell it doesn't ensure, but it makes it blatantly obvious that we're not specifying the encoding explicitly during development since the character set is so limited
thanks! For me, this solution worked by adding this JVM parameter when launching tomcat.
1

Quoted from defaultCharset()

The default charset is determined during virtual-machine startup and typically depends upon the locale and charset of the underlying operating system.

In most OSes you can set the charset using a environment variable.

1 Comment

Not really the answer I hoped for (I would have liked to be able to do it dynamically). Giving a sample of how to change the encoding for major OSes would be great. Thanks
1

I think you want this: System.setProperty("file.encoding", "UTF-8");

It solved some problems, but I still have another ones. The chars "í" and "Í" doesn't convert correctly if the SO is ISO-8859-1. Just with the JVM option on startup, I get it solved. Now just my Java Console in the NetBeans IDE is crashing charset when showing special chars.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.