2

I found this little snippet to to transform a string into an array of bytes:

    public byte[] GetBytes(string str)
    {
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }

And this one to transform an array of bytes into a string:

    public string GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.Length / sizeof(char)];
        System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
        return new string(chars);
    }

But I notice that the first one returns an array twice as big as the initial string (because sizeof(char) = 2) and every other slot in my array is a 0.

Example:

string = TEST
bytes[] = { 84, 0, 69, 0, 83, 0, 84, 0 };

I'm using this function to send packets in UDP, so I need my packets to be the smallest possible.

Why is the array twice bigger? How do I fix it?

3 Answers 3

4

.NET actually uses UTF-16 encoding to store string's and char's, which means each character is actually encoded with 2 bytes. This is detailed in Character Encoding in the .NET Framework:

UTF-16 encoding is used by the common language runtime to represent Char and String values, and it is used by the Windows operating system to represent WCHAR values.

So you should expect to get 2 bytes for every character in your string.

If you want to only get 1 byte for per character you have to use a different encoding. For this input, ASCII encoding will work:

public byte[] GetBytes(string str)
{
    return System.Text.Encoding.ASCII.GetBytes(str);
}

Calling this with the input "TEST" will return { 84, 69, 83, 84 }

Sign up to request clarification or add additional context in comments.

2 Comments

@AK_ sure but you never know what OP's going to be using this for. If he/she really needs a 1-to-1 correspondence between chars and bytes, UTF-8 won't work either. I showed ASCII in my answer because I figured it's the encoding OP's most likely to be familiar with.
UTF8 is compatible with ASCII, that is charecters from 0x00 to 0x7F, would be represenred th same in both...
4

To get bytes for a string use:

Encoding.Utf8.GetBytes()

http://msdn.microsoft.com/en-us/library/system.text.encoding.getbytes(v=vs.110).aspx

To go back to string use:

Encoding.Utf8.GetString()

http://msdn.microsoft.com/en-us/library/744y86tc(v=vs.110).aspx

Comments

4

In C#, char is a 16-bit datatype because .NET uses Unicode UTF-16 encoding natively.

If your test is entirely ASCII data, then you can use ASCIIEncoding.GetBytes to convert your string to bytes using the ASCII encoding.

It's probably better to use UTF8Encoding.GetBytes to convert to bytes using the UTF8 encoding. This supports the entire Unicode character set, not just ASCII, but encodes it in a way that doesn't include all of those zero bytes the way UTF16 does.

There's also The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) if you need to learn more about character encodings.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.