1

I need to write an integer to a byte array such that leading zeros are omitted and the bytes are written in big endian order.

Example:

int    original = 0x00123456;

byte[] encoded  = Encode(original);  //  == new byte[] { 0x12, 0x34, 0x56 };

int    decoded  = Decode(encoded);   //  == 0x123456

My Decode method:

private static int Decode(byte[] buffer, int index, int length)
{
    int result = 0;
    while (length > 0)
    {
        result = (result << 8) | buffer[index];
        index++;
        length--;
    }
    return result;
}

I'm struggling to come up with an Encode method that doesn't require a temporary buffer or reverses the bytes after writing them in little endian order. Can anyone help?

private static int Encode(int value, byte[] buffer, int index)
{

}
4
  • Why does Encode() take a byte[] argument? It should return one. Commented Jul 4, 2010 at 20:38
  • @Hans Passant: It takes a byte[] to write to at the specified index. I want to avoid unnecessary byte[] allocations. Commented Jul 4, 2010 at 20:49
  • Okay, makes sense. But how is the code that eventually reads this byte[] supposed to know how many significant bytes it should read? Commented Jul 4, 2010 at 20:58
  • @Hans Passant: Encode returns the number of bytes written. This value is stored along the encoded integer, so I can pass the length to Decode on decoding. Commented Jul 4, 2010 at 21:07

3 Answers 3

3

As per OP's request, here is a version without loops for a 32-bit number:

private static int Encode(int value, byte[] buffer, int index)
{
    byte temp;
    bool leading = true;

    temp = (value >> 24) & 0xFF;
    if (temp > 0) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 16) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 8) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = value & 0xFF;
    buffer[index++] = temp;

    return index;
}

Version using a loop for 32-bit numbers:

private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;

    for (int i = 3; i >= 0; i++) {
      byte temp = (byte)(value >> (8 * i));
      if (temp > 0 || length > 0) {
        buffer[index++] = temp;
        length++;
      }
    }

    return length;
}

Note that this version doesn't write anything if the input is just 0.

Sign up to request clarification or add additional context in comments.

6 Comments

-1. This is wrong. It doesn't exclude leading zeros. It excludes every byte whose value is zero, instead. Try it with 0x00120056 and you'll see the result is not correct.
Sorry, I overlooked that part. It's fixed now.
@casablanca: I like this. Please do me a small favour and turn this back into a loop (the shift is just 8*i). Can you also change it so it returns the number of bytes written?
@dtb, putting this in a loop will mean that you have a conditional branch in the loop. Once you have the first non-zero byte you can blindly pack the rest of the bytes without the conditional branch.
@Chris Taylor: I'm looking for a solution that is simple, readable, maintainable, not necessarily one that takes a few nanoseconds less than other solution. If you see two loops, you need to figure out what each loop's doing, while if there's only one loop, you know immediately that this loop simply writes some bytes to the byte[] in some way.
|
3
private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;
    int valueCopy = value;
    while (valueCopy != 0)
    {
        valueCopy >>= 8;
        length++;
    }
    for (int i = 0; i < length; i++)
    {
        buffer[index + length - i - 1] = (byte)value;
        value >>= 8;
    }
    return length;
}

7 Comments

The double-loop is somewhat ugly, but not nearly as ugly as what I've come up with so far. +1.
@dtb: If your input size is fixed, i.e. 32-bit or 64-bit, you can entirely do away with the loops and hard-code the conversion - it will even be a little faster.
@casablanca: Well, my input size is always 32-bit (a non-negative int), but I want to omit leading zeros when writing the value to the byte[]. Can you post an answer that shows how a solution without loops looks like? (Without omitting leading zeros it's simple; I'm struggling with the variable-length output.)
there's inherent need for the loop to find how long is the target number.
@digEmAll: You can only unroll the loop if you know the number of iterations, which isn't obvious in this case. @Pavel: You can break down the loop into if statements that check if each byte of the number (from left to right) is non-zero, and if so, append it to the buffer.
|
0

Note that you are saving the value to a variable Length byte array. If you save these byteArray, you need save also the length.

You can see the protected Functions Write7BitEncodedInt from BinaryWriter and Read7BitEncodedInt from BinaryReader.

These functions save storage on disk for positive numbers. A number 0-128 needs only one byte.

Microsoft uses these functions to store/retrieve the String length prefix when save to Stream.

To use these Functions, you can create own Class derived from BinaryReader / BinaryWriter.

1 Comment

Yes, I'm also saving the length so I can properly decode the byte[] back to an integer. Unfortunately I can't use Write7BitEncodedInt because I need to write the integer value exactly in the format described in my question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.