3

Im creating a class and converting it into xml.

The problem is that when i convert the class xml string into bytes
the ASCII.GetBytes return a byte array with
an extra character in the beginning of the ascArray

It's always a ? character so the xml starts like this

?<?xml version="1.0" encoding="utf-8"?>

Why is this happening?

This is the code:

  WorkItem p = new WorkItem();

  // Fill the class with whatever need to be sent to client
  OneItem posts1 = new OneItem();
  posts1.id = "id 1";
  posts1.username = "hasse";
  posts1.message = "hej again";
  posts1.time = "time1";
  p.setPost(posts1);

  OneItem posts2 = new OneItem();
  posts2.id = "id 2";
  posts2.username = "bella";
  posts2.message = "hej again again";
  posts2.time = "time2";
  p.setPost(posts2);

  // convert the class WorkItem to xml
  MemoryStream memoryStream = new MemoryStream();
  XmlSerializer xs = new XmlSerializer(typeof(WorkItem));
  XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
  xs.Serialize(xmlTextWriter, p);

  // send the xml version of WorkItem to client
  byte[] data = memoryStream.ToArray();
  clientStream.Write(data, 0, data.Length);
  Console.WriteLine(" send.." + data);
  clientStream.Close();
4
  • How does the input XmlizedString look like if you breakpoint at Byte[] ascArray= Encoding.ASCII.GetBytes(XmlizedString); Commented Jun 14, 2012 at 5:30
  • That's strange, It has the leading ? also but it's only visible if i copy/paste to content of the XmlizedString Commented Jun 14, 2012 at 5:39
  • So why is UTF8ByteArrayToString adding an extra byte Commented Jun 14, 2012 at 5:59
  • @Erik: It's not. You should look at the byte array that is passed to UTF8ByteArray. I'm pretty sure you'll find it starts with 0xEF, 0xBB, 0xBF, which is the UTF-8 representation of the byte order mark. That's being decoded by UTF8ByteArrayToString into a single character (not byte) - but that character can't be represented in ASCII. Fundamentally, you're applying a lossy transformation here. Commented Jun 14, 2012 at 6:03

1 Answer 1

4

I strongly suspect that the data starts with a byte order mark, which can't be represented in ASCII.

It's not clear why you're doing what you're doing in the first place, particularly around the MemoryStream. Why are you creating a UTF-8 encoded byte array, then decoding that to a string (and we don't know what UTF8ByteArrayToString does), then converting it back to a byte array? Why not just write the byte array straight to the client to start with? If you need the data as a string, I'd use a subclass of StringWriter which advertises that it uses UTF-8 as the encoding. If you don't need it as a string, just stick to the byte array.

Note that even aside from this first character, the fact that you've got an XML document encoded in UTF-8 means there may well be other non-ASCII characters in the string. Why are you using ASCII at all here?

EDIT: Just to be clear, you're fundamentally applying a lossy transformation, and doing it needlessly. Even if you want a local copy of the data, you should have something like this:

// Removed bad try/catch block - don't just catch Exception, and don't
// just swallow exceptions
MemoryStream memoryStream = new MemoryStream();
XmlSerializer xs = new XmlSerializer(typeof(WorkItem));
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream, Encoding.UTF8);
xs.Serialize(xmlTextWriter, p);

// Removed pointless conversion to/from string
// Removed pointless BinaryWriter (just use the stream)

// An alternative would be memoryStream.WriteTo(clientStream);
byte[] data = memoryStream.ToArray();
clientStream.Write(data, 0, data.Length);
Console.WriteLine(" send.." + data);

// Removed Close calls - you should use "using" statements to dispose of
// streams automatically.
Sign up to request clarification or add additional context in comments.

5 Comments

I want to send the class to a java client. Thats why i first convert the class to XML and then to bytes.
@Erik: But you're converting it directly to bytes, in a MemoryStream. Why are you then converting it to a string and then back to bytes, using the wrong encoding? Why are you even using a MemoryStream instead of creating the XmlTextWriter straight from clientStream?
@Erik: I've edited my answer with better code, including a bunch of comments explaining what I've done.
Now i get even more bad character at the beginning, <?xml version="1.0" encoding="utf-8"?><Work... Updating the question with the new code. Maybe the reason is that I in the java client do like this String XMlString = dataInputStream.readLine();
Those three "characters" are definitely the UTF-8 byte order mark

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.