Transform a string into an array of bytes

Question

I found this little snippet to to transform a string into an array of bytes:

    public byte[] GetBytes(string str)
    {
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }

And this one to transform an array of bytes into a string:

    public string GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.Length / sizeof(char)];
        System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
        return new string(chars);
    }

But I notice that the first one returns an array twice as big as the initial string (because sizeof(char) = 2) and every other slot in my array is a 0.

Example:

string = TEST
bytes[] = { 84, 0, 69, 0, 83, 0, 84, 0 };

I'm using this function to send packets in UDP, so I need my packets to be the smallest possible.

Why is the array twice bigger? How do I fix it?

p.s.w.g · Accepted Answer · 2014-06-19 22:20:41Z

4

.NET actually uses UTF-16 encoding to store string's and char's, which means each character is actually encoded with 2 bytes. This is detailed in Character Encoding in the .NET Framework:

UTF-16 encoding is used by the common language runtime to represent Char and String values, and it is used by the Windows operating system to represent WCHAR values.

So you should expect to get 2 bytes for every character in your string.

If you want to only get 1 byte for per character you have to use a different encoding. For this input, ASCII encoding will work:

public byte[] GetBytes(string str)
{
    return System.Text.Encoding.ASCII.GetBytes(str);
}

Calling this with the input "TEST" will return { 84, 69, 83, 84 }

edited Jun 19, 2014 at 22:20

answered Jun 19, 2014 at 22:10

p.s.w.g

150k31 gold badges307 silver badges339 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

p.s.w.g Over a year ago

@AK_ sure but you never know what OP's going to be using this for. If he/she really needs a 1-to-1 correspondence between chars and bytes, UTF-8 won't work either. I showed ASCII in my answer because I figured it's the encoding OP's most likely to be familiar with.

AK_ Over a year ago

UTF8 is compatible with ASCII, that is charecters from 0x00 to 0x7F, would be represenred th same in both...

Mike Hixson · Accepted Answer · 2014-06-19 22:11:19Z

4

To get bytes for a string use:

Encoding.Utf8.GetBytes()

http://msdn.microsoft.com/en-us/library/system.text.encoding.getbytes(v=vs.110).aspx

To go back to string use:

Encoding.Utf8.GetString()

http://msdn.microsoft.com/en-us/library/744y86tc(v=vs.110).aspx

answered Jun 19, 2014 at 22:11

Mike Hixson

5,2171 gold badge21 silver badges24 bronze badges

Comments

pmcoltrane · Accepted Answer · 2014-06-19 22:15:46Z

4

In C#, char is a 16-bit datatype because .NET uses Unicode UTF-16 encoding natively.

If your test is entirely ASCII data, then you can use ASCIIEncoding.GetBytes to convert your string to bytes using the ASCII encoding.

It's probably better to use UTF8Encoding.GetBytes to convert to bytes using the UTF8 encoding. This supports the entire Unicode character set, not just ASCII, but encodes it in a way that doesn't include all of those zero bytes the way UTF16 does.

There's also The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) if you need to learn more about character encodings.

answered Jun 19, 2014 at 22:15

pmcoltrane

3,1121 gold badge29 silver badges32 bronze badges

Collectives™ on Stack Overflow

Transform a string into an array of bytes

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related