0

I'm reading a socket stream and converting the byte array to a single string in both Java & C#, but the results are different...

C# code:

string text = Encoding.Default.GetString(ms.ToArray());

Java code:

String text = new String(data);

One of the potential issues I encountered upon research was that C#'s default encoding was UTF-32 & Java's was UTF8, as well as C# uses little endian and Java uses Big endian, so the solution would be to define charset in java as UTF-32LE but even then it returns entirely different to C# and most if not all of the string is a combination of

Just as extra information regarding my methods in Java I'm using ByteArrayOutputStream to store data from DataInputStream & in C# I'm using MemoryStream to store data from NetworkStream

2
  • Java's default encoding is whatever your platform supports by default. It's intended for students who want a "simple" way to be compatible with a local platform. If you want UTF-8, you have to specify it: docs.oracle.com/javase/8/docs/api/java/lang/… Commented Jun 4, 2020 at 22:52
  • Are you sure Encoding.Default is UTF-32? That seems like a strange default, since no Windows system uses that normally. Perhaps you should print out Encoding.Default, and/or examine the actual bytes in data on the Java side? Commented Jun 5, 2020 at 3:10

1 Answer 1

2

As the bytes in the stream were encoded using some encoding, you must explicitly set the correct encoding in your C# and Java code.

They use different default encodings.

To make the byte stream interoperable, you must stick with one encoding which is used to encode string to bytes. Or exchange the encoding type somewhere in the stream.

Sign up to request clarification or add additional context in comments.

7 Comments

I'm trying to configure Java to output the same results as C#, someone stated that UTF-32 was C#'s default and it uses Little Endian, so I tried new String(bytes, "UTF-32LE") but it still outputs entirely different results to C#
How different? Just little endian and big endian different or is nothing comparable?
@Michael If I use UTF8 in Java it's similar to some extent but entire string doesn't match & leaves some response unreadable with sequences of , if I use the suggested UTF-32LE it's non comparable, entire string is and some other variations such as 䀀Ā but nothing readable.
Try looking at the byte array and compare that. Not sure it this is the problem but do you know what the difference is between big endian and little endian?
@Michael I'm not familiar with Big endian & Little endian, but it isn't in the way the bytes are stored? Big endian/Little endian is almost a reverse of one another.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.