3

I'm trying to store a Gzip serialized object into Active Directory's "Extension Attribute", more info here. This field is a Unicode string according to it's oM syntax of 64.

What is the most efficient way to store a binary blob as Unicode? Once I get this down, the rest is a piece of cake.

2 Answers 2

4

There are, of course, many ways of reliably packing an arbitrary byte array into Unicode characters, but none of them are very efficient. It is very unfortunate that ActiveDirectory would choose to use Unicode for data that is not textual in nature. It’s like using a string to represent a 32-bit integer, or like using Nutella to write a love letter.

My recommendation would be to “play it safe” and use an ASCII-based encoding such as base64. The reason I recommend this is because there is already a built-in .NET implementation for this:

var base64Encoded = Convert.ToBase64String(byteArray);

var original = Convert.FromBase64String(base64Encoded);

In theory you could come up with an encoding that is more efficient than this by making use of more of the Unicode character set. However, in order to do so reliably, you would need to know quite a bit about Unicode.

Sign up to request clarification or add additional context in comments.

1 Comment

Just to be fair to MSFT, there are other binary properties that I could use but the client wants me to use "extension attributes" which are Unicode. There are Byte[] in other spots too. I like Nutella love letters. +1
1

Normally, this would be the way to convert between bytes and Unicode text:

// string from bytes
System.Text.Encoding.Unicode.GetString(bytes);

// bytes from string
System.Text.Encoding.Unicode.GetBytes(bytes);

EDIT:
But since not every possible byte sequence is a valid Unicode string, you should use a method that can create a string from an arbitrary byte sequence:

// string from bytes
Convert.ToBase64String(byteArray);

// bytes from string
Convert.FromBase64String(base64Encoded);

(Thanks to @Timwi who pointed this out!)

9 Comments

Thanks! I'm trying to keep my brain sharp while I'm on painkillers from my motorcycle injury. I think I should have known this. Just perfect
This answer is completely wrong. If you use this, you will lose data. Encoding.Unicode encapsulates UTF-16, and not all byte arrays are valid UTF-16. Consider arrays with odd numbers of bytes, or byte sequences with lone surrogates, for example. Neither are valid UTF-16 and would generate a string that doesn’t turn back into the original byte array.
@Venemo: No, of course not — half of all bytes are not valid ASCII characters! The encodings in System. Text .Encoding are meant to encode text as the name implies. You should use an encoding that is designed for arbitrary byte data. Base64 is an example of that.
@Venemo: Then you are looking at a code table that doesn’t represent ASCII. Just run Encoding.ASCII.GetString(new byte[] { 63 }) and then Encoding.ASCII.GetString(new byte[] { 129 }) (hint: you get the same answer for both). You are looking at one that perhaps represents Latin-1 (ISO-8859-1) or Windows-1252. However, even in those not all 256 possible values have a valid character. The non-Unicode encodings turn several possible bytes values into question marks.
@Venemo: Well that website is wrong. It shows the Windows-1252 character set, not ASCII.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.