Java string results different to C#

Question

I'm reading a socket stream and converting the byte array to a single string in both Java & C#, but the results are different...

C# code:

string text = Encoding.Default.GetString(ms.ToArray());

Java code:

String text = new String(data);

One of the potential issues I encountered upon research was that C#'s default encoding was UTF-32 & Java's was UTF8, as well as C# uses little endian and Java uses Big endian, so the solution would be to define charset in java as UTF-32LE but even then it returns entirely different to C# and most if not all of the string is a combination of �

Just as extra information regarding my methods in Java I'm using ByteArrayOutputStream to store data from DataInputStream & in C# I'm using MemoryStream to store data from NetworkStream

Java's default encoding is whatever your platform supports by default. It's intended for students who want a "simple" way to be compatible with a local platform. If you want UTF-8, you have to specify it: docs.oracle.com/javase/8/docs/api/java/lang/… — markspace
– markspace, Commented Jun 4, 2020 at 22:52
Are you sure Encoding.Default is UTF-32? That seems like a strange default, since no Windows system uses that normally. Perhaps you should print out Encoding.Default, and/or examine the actual bytes in data on the Java side? — VGR
– VGR, Commented Jun 5, 2020 at 3:10

Jozef Izso · Accepted Answer · 2020-06-04 22:35:41Z

2

As the bytes in the stream were encoded using some encoding, you must explicitly set the correct encoding in your C# and Java code.

They use different default encodings.

To make the byte stream interoperable, you must stick with one encoding which is used to encode string to bytes. Or exchange the encoding type somewhere in the stream.

answered Jun 4, 2020 at 22:35

Jozef Izso

1,8031 gold badge16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Intel Over a year ago

I'm trying to configure Java to output the same results as C#, someone stated that UTF-32 was C#'s default and it uses Little Endian, so I tried new String(bytes, "UTF-32LE") but it still outputs entirely different results to C#

Michael Over a year ago

How different? Just little endian and big endian different or is nothing comparable?

Intel Over a year ago

@Michael If I use UTF8 in Java it's similar to some extent but entire string doesn't match & leaves some response unreadable with sequences of �, if I use the suggested UTF-32LE it's non comparable, entire string is � and some other variations such as 䀀Ā but nothing readable.

Michael Over a year ago

Try looking at the byte array and compare that. Not sure it this is the problem but do you know what the difference is between big endian and little endian?

Intel Over a year ago

@Michael I'm not familiar with Big endian & Little endian, but it isn't in the way the bytes are stored? Big endian/Little endian is almost a reverse of one another.

|

Collectives™ on Stack Overflow

Java string results different to C#

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related