0

I am seeing an unexpected character (?) in the output of Encoding.ASCII.GetBytes method.

So I am doing the following:

var stringBytes = Encoding.ASCII.GetBytes(myString);

Where myString is:

{
  "$id": "1",
  "Note": "<p><span style=\"font-family: &quot;Courier New&quot;;\">aaaa</span> 
  <br></p>"
}

Now right after if I do:

var myString1 = System.Text.Encoding.Default.GetString(stringBytes)

Then myString1 is returned as:

{
  "$id": "1",
  "Note": "<p><span style=\"font-family: &quot;Courier New&quot;;\">? 
   aaaa</span><br></p>"
}

Note how the aaaa is transformed to ?aaaa in the last operation?

Can someone please tell me what I missing here? Thank you.

2
  • 1
    Why are you using Encoding.Default to decode a string encoded with Encoding.ASCII? Even if your system did default to Encoding.ASCII for Encoding.Default, it seems like a bad idea in general. *On .NET Core Encoding.Default is always Encoding.UTF8. Commented Apr 12, 2019 at 1:31
  • Thanks @John, yes you are right. I missed that. I will fix it but, that didn't fix the above problem. I believe Alexi's solution is a possible fix. Cheers. Commented Apr 12, 2019 at 1:38

1 Answer 1

5

This is expected behavior of ASCII encoding when it finds character outside 0-127 range like in your case. To fix - either switch to UTF8 (as it supports all character) or manually encode all characters outside 0-127 into something that works for you (for JSON you can use hex encoding with "\u" prefix - "\ufeff" )

The string "aaaa" for some reason starts with BOM (0xFEFF) which you can't see, but it is there and has to be converted to "?" by ASCII encoding. To see the character code - select piece of string and print it as HEX:

  ((int)(">aaaa"[1])).ToString("x")  // gives FEFF on your string of length 6

Note that BOM (byte order mark) in the middle of the text is usually a bug, in this case is likely the code that constructs HTML is concatenating files or something similar. Guidance from Unicode.org - What should I do with U+FEFF in the middle of a file?

Thanks to Klaus Gütter for the link to BOM FAQ and Tom Blodget for highlighting issues with BOM in the middle of a text.

Sign up to request clarification or add additional context in comments.

5 Comments

Alexi, thank you very much. Yes now it makes sense. I used UTF8 to both encode and decode and it is working as I expect it. Cheers.
JSON is required to be encoded with UTF-8 for inter-system communication.
@Stackedup A BOM should not be allowed to make it into a text datatype. A BOM is metadata not text.
@TomBlodget thank you for highlighting that. That is going to be difficult to figure out. I am using Summernote editor. So I am reading its HTML content (.summernote('code')) and pass it to the server. So the bug could be in the Summernote.
@TomBlodget Relevant section in BOM FAQ: What should I do with U+FEFF in the middle of a file?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.