I have code which computes the SHA-256 hash of a String, and noticed that I was getting different hashes from Android and Oracle Java 7 for the same string. My hashing code converts the String into byte[] with:
byte[] data = stringData.getBytes("UTF-16");
With UTF-16 encoding, I get different results from Oracle Java and Android Java. This is the string I was hashing:
// Test Code:
String toHash = "testdata";
System.out.println("Hash: " +DataHash.getHashString(toHash));
And get theses hashes with UTF-16:
Hash: a1112a0363a59097a701e38398e1fdfef3049358aee81b77ecaad2924a426bc5 [Oracle Java 7]
Hash: 811b723aee07c7a52456fc57a5683e73649075a373d341f7257bf73575111ba3 [Android 2.2]
However, with UTF-8, I get the same hash with both JREs:
Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Oracle Java 7]
Hash: 810ff2fb242a5dee4220f2cb0e6a519891fb67f2f828a6cab4ef8894633b1f50 [Android 2.2]
Is there some kind of endian-ness issue going on which is causing the different results on the different platforms? How should I really be preparing a String to be hashed in a platform independent way?
EDIT: Whoops, the answer is rather obvious once you read about UTF-16 a bit more. There are two versions of UTF-16 (big-endian and little-endian). You just need to specify which version getBytes() should use, and the hashes are the same. Pick one of:
- UTF-16LE
- UTF-16BE