As far as I know, in String.getBytes(charset), the argument, charset means that the method returns bytes of a string encoded as the given charset.
In new String(bytes, charset), the second argument, charset means that the method decodes bytes as the given charset and returns the decoded result.
According to the above, and as my understanding, the charset arguments of the two different methods must be the same so that new String(bytes, charset) can return a proper string. (I guess here is what I'm missing.)
I have an incorrectly decoded string and I tested the following code with this:
String originalStr = "Å×½ºÆ®"; // 테스트
String [] charSet = {"utf-8","euc-kr","ksc5601","iso-8859-1","x-windows-949"};
for (int i=0; i<charSet.length; i++) {
for (int j=0; j<charSet.length; j++) {
try {
System.out.println("[" + charSet[i] +"," + charSet[j] +"] = " + new String(originalStr.getBytes(charSet[i]), charSet[j]));
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
}
The output is:
[utf-8,utf-8] = Å×½ºÆ®
[utf-8,euc-kr] = ��쩍쨘�짰
[utf-8,ksc5601] = ��쩍쨘�짰
[utf-8,iso-8859-1] = Å×½ºÆ®
[utf-8,x-windows-949] = 횇횞쩍쨘횈짰
[euc-kr,utf-8] = ?����������
[euc-kr,euc-kr] = ?×½ºÆ®
[euc-kr,ksc5601] = ?×½ºÆ®
[euc-kr,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[euc-kr,x-windows-949] = ?×½ºÆ®
[ksc5601,utf-8] = ?����������
[ksc5601,euc-kr] = ?×½ºÆ®
[ksc5601,ksc5601] = ?×½ºÆ®
[ksc5601,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[ksc5601,x-windows-949] = ?×½ºÆ®
[iso-8859-1,utf-8] = ��Ʈ
[iso-8859-1,euc-kr] = 테스트
[iso-8859-1,ksc5601] = 테스트
[iso-8859-1,iso-8859-1] = Å×½ºÆ®
[iso-8859-1,x-windows-949] = 테스트
[x-windows-949,utf-8] = ?����������
[x-windows-949,euc-kr] = ?×½ºÆ®
[x-windows-949,ksc5601] = ?×½ºÆ®
[x-windows-949,iso-8859-1] = ?¡¿¨ö¨¬¨¡¢ç
[x-windows-949,x-windows-949] = ?×½ºÆ®
As you can see, I figure out the way of getting the original string:
[iso-8859-1,euc-kr] = 테스트
[iso-8859-1,ksc5601] = 테스트
[iso-8859-1,x-windows-949] = 테스트
How can it be possible? How can the string be encoded and decoded properly as different character sets?