Encode integer as variable-length big-endian byte-array

Question

I need to write an integer to a byte array such that leading zeros are omitted and the bytes are written in big endian order.

Example:

int    original = 0x00123456;

byte[] encoded  = Encode(original);  //  == new byte[] { 0x12, 0x34, 0x56 };

int    decoded  = Decode(encoded);   //  == 0x123456

My Decode method:

private static int Decode(byte[] buffer, int index, int length)
{
    int result = 0;
    while (length > 0)
    {
        result = (result << 8) | buffer[index];
        index++;
        length--;
    }
    return result;
}

I'm struggling to come up with an Encode method that doesn't require a temporary buffer or reverses the bytes after writing them in little endian order. Can anyone help?

private static int Encode(int value, byte[] buffer, int index)
{

}

Why does Encode() take a byte[] argument? It should return one. — Hans Passant
– Hans Passant, Commented Jul 4, 2010 at 20:38
@Hans Passant: It takes a byte[] to write to at the specified index. I want to avoid unnecessary byte[] allocations. — dtb
– dtb, Commented Jul 4, 2010 at 20:49
Okay, makes sense. But how is the code that eventually reads this byte[] supposed to know how many significant bytes it should read? — Hans Passant
– Hans Passant, Commented Jul 4, 2010 at 20:58
@Hans Passant: Encode returns the number of bytes written. This value is stored along the encoded integer, so I can pass the length to Decode on decoding. — dtb
– dtb, Commented Jul 4, 2010 at 21:07

dtb · Accepted Answer · 2010-07-04 21:51:10Z

3

As per OP's request, here is a version without loops for a 32-bit number:

private static int Encode(int value, byte[] buffer, int index)
{
    byte temp;
    bool leading = true;

    temp = (value >> 24) & 0xFF;
    if (temp > 0) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 16) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 8) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = value & 0xFF;
    buffer[index++] = temp;

    return index;
}

Version using a loop for 32-bit numbers:

private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;

    for (int i = 3; i >= 0; i++) {
      byte temp = (byte)(value >> (8 * i));
      if (temp > 0 || length > 0) {
        buffer[index++] = temp;
        length++;
      }
    }

    return length;
}

Note that this version doesn't write anything if the input is just 0.

edited Jul 4, 2010 at 21:51

dtb

218k37 gold badges413 silver badges438 bronze badges

answered Jul 4, 2010 at 21:17

casablanca

70.9k8 gold badges139 silver badges155 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Juan Pablo Califano Over a year ago

-1. This is wrong. It doesn't exclude leading zeros. It excludes every byte whose value is zero, instead. Try it with 0x00120056 and you'll see the result is not correct.

casablanca Over a year ago

Sorry, I overlooked that part. It's fixed now.

dtb Over a year ago

@casablanca: I like this. Please do me a small favour and turn this back into a loop (the shift is just 8*i). Can you also change it so it returns the number of bytes written?

Chris Taylor Over a year ago

@dtb, putting this in a loop will mean that you have a conditional branch in the loop. Once you have the first non-zero byte you can blindly pack the rest of the bytes without the conditional branch.

dtb Over a year ago

@Chris Taylor: I'm looking for a solution that is simple, readable, maintainable, not necessarily one that takes a few nanoseconds less than other solution. If you see two loops, you need to figure out what each loop's doing, while if there's only one loop, you know immediately that this loop simply writes some bytes to the byte[] in some way.

|

Pavel Radzivilovsky · Accepted Answer · 2010-07-05 07:25:29Z

3

private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;
    int valueCopy = value;
    while (valueCopy != 0)
    {
        valueCopy >>= 8;
        length++;
    }
    for (int i = 0; i < length; i++)
    {
        buffer[index + length - i - 1] = (byte)value;
        value >>= 8;
    }
    return length;
}

edited Jul 5, 2010 at 7:25

answered Jul 4, 2010 at 20:35

Pavel Radzivilovsky

19.2k5 gold badges60 silver badges70 bronze badges

7 Comments

dtb Over a year ago

The double-loop is somewhat ugly, but not nearly as ugly as what I've come up with so far. +1.

casablanca Over a year ago

@dtb: If your input size is fixed, i.e. 32-bit or 64-bit, you can entirely do away with the loops and hard-code the conversion - it will even be a little faster.

dtb Over a year ago

@casablanca: Well, my input size is always 32-bit (a non-negative int), but I want to omit leading zeros when writing the value to the byte[]. Can you post an answer that shows how a solution without loops looks like? (Without omitting leading zeros it's simple; I'm struggling with the variable-length output.)

Pavel Radzivilovsky Over a year ago

there's inherent need for the loop to find how long is the target number.

casablanca Over a year ago

@digEmAll: You can only unroll the loop if you know the number of iterations, which isn't obvious in this case. @Pavel: You can break down the loop into if statements that check if each byte of the number (from left to right) is non-zero, and if so, append it to the buffer.

|

x77 · Accepted Answer · 2010-07-04 21:00:17Z

0

Note that you are saving the value to a variable Length byte array. If you save these byteArray, you need save also the length.

You can see the protected Functions Write7BitEncodedInt from BinaryWriter and Read7BitEncodedInt from BinaryReader.

These functions save storage on disk for positive numbers. A number 0-128 needs only one byte.

Microsoft uses these functions to store/retrieve the String length prefix when save to Stream.

To use these Functions, you can create own Class derived from BinaryReader / BinaryWriter.

answered Jul 4, 2010 at 21:00

x77

7371 gold badge4 silver badges12 bronze badges

1 Comment

dtb Over a year ago

Yes, I'm also saving the length so I can properly decode the byte[] back to an integer. Unfortunately I can't use Write7BitEncodedInt because I need to write the integer value exactly in the format described in my question.

Collectives™ on Stack Overflow

Encode integer as variable-length big-endian byte-array

3 Answers 3

6 Comments

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related