How do I encode a Binary blob as Unicode blob?

Question

I'm trying to store a Gzip serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64.

What is the most efficient way to store a binary blob as Unicode? Once I get this down, the rest is a piece of cake.

Timwi · Accepted Answer · 2010-09-16 00:25:32Z

4

There are, of course, many ways of reliably packing an arbitrary byte array into Unicode characters, but none of them are very efficient. It is very unfortunate that ActiveDirectory would choose to use Unicode for data that is not textual in nature. It’s like using a string to represent a 32-bit integer, or like using Nutella to write a love letter.

My recommendation would be to “play it safe” and use an ASCII-based encoding such as base64. The reason I recommend this is because there is already a built-in .NET implementation for this:

var base64Encoded = Convert.ToBase64String(byteArray);

var original = Convert.FromBase64String(base64Encoded);

In theory you could come up with an encoding that is more efficient than this by making use of more of the Unicode character set. However, in order to do so reliably, you would need to know quite a bit about Unicode.

answered Sep 16, 2010 at 0:25

Timwi

66.8k34 gold badges172 silver badges235 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

makerofthings7 Over a year ago

Just to be fair to MSFT, there are other binary properties that I could use but the client wants me to use "extension attributes" which are Unicode. There are Byte[] in other spots too. I like Nutella love letters. +1

Venemo · Accepted Answer · 2010-09-16 00:47:28Z

1

Normally, this would be the way to convert between bytes and Unicode text:

// string from bytes
System.Text.Encoding.Unicode.GetString(bytes);

// bytes from string
System.Text.Encoding.Unicode.GetBytes(bytes);

EDIT:
But since not every possible byte sequence is a valid Unicode string, you should use a method that can create a string from an arbitrary byte sequence:

// string from bytes
Convert.ToBase64String(byteArray);

// bytes from string
Convert.FromBase64String(base64Encoded);

(Thanks to @Timwi who pointed this out!)

edited Sep 16, 2010 at 0:47

answered Sep 15, 2010 at 23:25

Venemo

19.2k13 gold badges90 silver badges131 bronze badges

9 Comments

makerofthings7 Over a year ago

Thanks! I'm trying to keep my brain sharp while I'm on painkillers from my motorcycle injury. I think I should have known this. Just perfect

Timwi Over a year ago

This answer is completely wrong. If you use this, you will lose data. Encoding.Unicode encapsulates UTF-16, and not all byte arrays are valid UTF-16. Consider arrays with odd numbers of bytes, or byte sequences with lone surrogates, for example. Neither are valid UTF-16 and would generate a string that doesn’t turn back into the original byte array.

Timwi Over a year ago

@Venemo: No, of course not — half of all bytes are not valid ASCII characters! The encodings in System. Text .Encoding are meant to encode text as the name implies. You should use an encoding that is designed for arbitrary byte data. Base64 is an example of that.

Timwi Over a year ago

@Venemo: Then you are looking at a code table that doesn’t represent ASCII. Just run Encoding.ASCII.GetString(new byte[] { 63 }) and then Encoding.ASCII.GetString(new byte[] { 129 }) (hint: you get the same answer for both). You are looking at one that perhaps represents Latin-1 (ISO-8859-1) or Windows-1252. However, even in those not all 256 possible values have a valid character. The non-Unicode encodings turn several possible bytes values into question marks.

Timwi Over a year ago

@Venemo: Well that website is wrong. It shows the Windows-1252 character set, not ASCII.

|

Collectives™ on Stack Overflow

How do I encode a Binary blob as Unicode blob?

2 Answers 2

1 Comment

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related