Encoding problem between C# TCP server and Java TCP Client

Question

i'm facing some encoding issue which i'm not able to find the correct solution.

I have a C# TCP server, running as a window service which received and respond XML, the problem comes down when passing special characters in the output such as spanish characters with accents (like á,é,í and others).

Server response is being encoded as UTF-8, and java client is reading using UTF-8. But when i print its output the character is totally different.

This problem only happens in Java client(C# TCP client works as expected).

Following is an snippet of the server code that shows the encoding issue: C# Server

   byte[] destBytes = System.Text.Encoding.UTF8.GetBytes("á");
    try
    {
       clientStream.Write(destBytes, 0, destBytes.Length);
       clientStream.Flush();
    }catch (Exception ex)
    {
       LogErrorMessage("Error en SendResponseToClient: Detalle::", ex);
    }

Java Client:

socket.connect(new InetSocketAddress(param.getServerIp(), param.getPort()), 20000);
InputStream sockInp = socket.getInputStream();
InputStreamReader streamReader = new InputStreamReader(sockInp, Charset.forName("UTF-8"));
sockReader =  new BufferedReader(streamReader);
String tmp = null;
while((tmp = sockReader.readLine()) != null){
  System.out.println(tmp);
}

For this simple test, the output show is:

ß

I did some testing printing out the byte[] on each language and while on C# á output as: 195, 161

In java byte[] read print as: -61,-95

Will this have to do with the Signed (java), UnSigned (C#) of byte type?.

Any feedback is greatly appreciated.

Not an answer, but a datapoint anyways - python does decode the C# version as you intended: print ''.join(chr(x) for x in [195, 161]).decode('utf-8') -> á. The java's one is not a valid utf-8 apparently if I try to preserve that order. — viraptor
– viraptor, Commented Aug 28, 2011 at 0:59
i made a mistake in the aboves example (i already edit it), In java byte[] print as: -61,-95. Which is a valid UTF8 character. The problem seems to lies in the OS (window) itself. I dont know what weird settings it haves that prints the wrong character. — jcgarciam
– jcgarciam, Commented Aug 28, 2011 at 14:48

Yahia · Accepted Answer · 2011-08-28 00:35:10Z

1

To me this seems like an endianess problem... you can check that by reversing the bytes in Java before printing the string...

which usually would be solved by including a BOM... see http://de.wikipedia.org/wiki/Byte_Order_Mark

answered Aug 28, 2011 at 0:35

Yahia

70.5k9 gold badges117 silver badges147 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

jcgarciam Over a year ago

Im under the same impression, after reading how about Endian in C# and Java.

viraptor Over a year ago

If it's utf-8, then BOM is not needed and will not change anything. utf-8 encoding always has the same representation - on little and big endian machines. (unicode.org/faq/utf_bom.html#bom5)

jcgarciam Over a year ago

I think the problem may be in SO where the server is running, creating a simple java programa that should print -> á and running it there is printing the weird character as well, while on other OS (linux) it prints correctly the expected character. So i just discarded the Socket and encoding from End To End.

Yahia Over a year ago

if the OS has some weird settings this could happen :-(

jcgarciam Over a year ago

Any suggestion where should i look at in the OS setting? Regional Settings?

|

Community · Accepted Answer · 2017-05-23 12:33:26Z

0

Are you sure that's not a unicode character you are attemping to encode to bytes as UTF-8 data?

I found the below has a useful way of testing to see if the data in that string is ccorrect UTF-8 before you send it.

How to test an application for correct encoding (e.g. UTF-8)

edited May 23, 2017 at 12:33

CommunityBot

11 silver badge

answered Aug 28, 2011 at 0:44

Brandon Langley

5513 silver badges9 bronze badges

1 Comment

jcgarciam Over a year ago

Im not quite understanding your statement. From my above example im getting the UTF-8 byte[] of just á to test the the encoding.

Collectives™ on Stack Overflow

Encoding problem between C# TCP server and Java TCP Client

2 Answers 2

6 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related