1

I am trying to determine if an in-house method will decode a byte array correctly given different encodings. The following code is how I approached generating data to encode.

public class Encoding {

  static byte[] VALUES = {(byte) 0x00, ..... (byte) 0xFF};
  static String[] ENCODING = {"Windows-1252","ISO-8859-1"};

  public static void main(String[] args) throws UnsupportedEncodingException {

    for(String encode : ENCODING) {
      for(byte value : VALUES) {
        byte[] inputByte = new byte[]{value};
        String input = new String(inputByte, encode);
        String houseInput = houseMethod(input.getBytes());
      }
    }
  }
}

My question is when it comes making the call to the house method, what encoding will it send to that method? It is my understanding when Java stores a String, it converts it to UTF-16. So when I am sending Input.getBytes(), is it sending the UTF-16 encoding byte or the encoding scheme that I set when I created a new String? I am guessing that it is UTF-16, but I am not sure. Should the house method be???

houseMethod(input.getBytes(encode))
4
  • Bytes have no encoding; characters and strings have one Commented Sep 10, 2015 at 15:02
  • @fantaghirocco wrong. See docs.oracle.com/javase/tutorial/i18n/text/string.html defaultBytes vs utf8Bytes Commented Sep 10, 2015 at 15:04
  • @fantaghirocco No, characters and strings in Java don't have an encoding. The encoding is what you need to convert between characters/strings and bytes. Commented Sep 10, 2015 at 15:06
  • @Laurentiu ok I got it, thanks!!! :) Commented Sep 10, 2015 at 15:08

2 Answers 2

4

See String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

You are well advised to use the String.getBytes(Charset) method instead and explicitly specify the desired encoding.

Sign up to request clarification or add additional context in comments.

Comments

2

As per Java documentation String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array

So the bytes that the in house method gets depend on which OS you are, as well as your locale settings.

OTH, String.getBytes(encoding) ensures you get the bytes in the encoding you pass as parameter.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.