4

I have code which computes the SHA-256 hash of a String, and noticed that I was getting different hashes from Android and Oracle Java 7 for the same string. My hashing code converts the String into byte[] with:

byte[] data = stringData.getBytes("UTF-16");

With UTF-16 encoding, I get different results from Oracle Java and Android Java. This is the string I was hashing:

// Test Code:
String toHash = "testdata";
System.out.println("Hash: " +DataHash.getHashString(toHash));

And get theses hashes with UTF-16:

Hash: a1112a0363a59097a701e38398e1fdfef3049358aee81b77ecaad2924a426bc5 [Oracle Java 7]
Hash: 811b723aee07c7a52456fc57a5683e73649075a373d341f7257bf73575111ba3 [Android 2.2]

However, with UTF-8, I get the same hash with both JREs:

Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Oracle Java 7]
Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Android 2.2]

Is there some kind of endian-ness issue going on which is causing the different results on the different platforms? How should I really be preparing a String to be hashed in a platform independent way?

EDIT: Whoops, the answer is rather obvious once you read about UTF-16 a bit more. There are two versions of UTF-16 (big-endian and little-endian). You just need to specify which version getBytes() should use, and the hashes are the same. Pick one of:

  • UTF-16LE
  • UTF-16BE

2 Answers 2

1

According to the documentation of Orcale Java:

When decoding, the UTF-16 charset interprets a byte-order mark to indicate the byte order of the stream but defaults to big-endian if there is no byte-order mark; when encoding, it uses big-endian byte order and writes a big-endian byte-order mark.

That means plain UTF-16 should always encode as Big Endian in Oracle Java.

Then from Android Java documentation:

Charset            Encoder writes
UTF-16BE           BE, no BOM
UTF-16LE           LE, no BOM
UTF-16             BE, with BE BOM

So there is a bug in either one, or in the documentation. Both must be Big Endian, and write BOM, so there shouldn't be any difference.

In general you should prefer UTF-16BE/LE over UTF-16, but in this case it seems to be a bug.

Sign up to request clarification or add additional context in comments.

3 Comments

Ahh, interesting. It does look like Android (2.2 at least) is doing little-endian conversion: Oracle Java 7: UTF-16: [-2, -1, 0, 116, 0, 101, 0, 115, 0, 116, 0, 100, 0, 97, 0, 116, 0, 97] Android Java 2.2: UTF-16: [-1, -2, 116, 0, 101, 0, 115, 0, 116, 0, 100, 0, 97, 0, 116, 0, 97, 0]
@TajMorton -1, -2, 116, 0.. is Little Endian, with LE BOM. Is that from Android? Anyway, it clearly contradicts with Android documentation.
Sorry, my formatting got destroyed and I accidentally posted before I was ready. Oracle Java 7 gave [-2, -1, 0, 116] with "UTF-16", whereas Android 2.2 gave [-2, -1, 116, 0]. So yes, it does look like it's producing LE with a LE BOM.
0

Show your hashing code, but it is probably doing something wrong. The results of hashing is a byte[] so there is no need to convert from string to byte[] in the first place. For converting a binary hash value to a String use Base64 or hex encoding.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.