0

I need to convert the following to a binary format (and later recoup) in the smallest amount of data possible.

my_arr = [
        [128,32 ,22,23],
        [104,53 ,21,25],
        [150,55 ,79,23],
        [104,101,23,8 ],
        [57 ,117,13,21],
        [37 ,135,21,20],
        [81 ,132,23,6 ],
        [81 ,138,7 ,8 ],
        [97 ,138,7 ,8 ]...

the numbers don't exceed 399

If I use a 0 for each digit (8 0's in a row = 8) and a 1 as separator, the first line looks like this: 010010000000011000100110010011001000 This is really long for numbers like 99

If I pad each number to three digits and convert each in turn to actual binary the first line looks like this: 000100101000000000110010000000100010000000100011 This works out as 12 chars per number.

As the first char won't ever be a 4 or above I can save two digts by treating 0 as 00, 1 as 01, 2 as 10 and 3 as 11. Hence 10 chars per number On the whole this reduces the size down to about 90% of the first option (on average) but is there a shorter way?

edit: yes as a string of 1's and 0's... and it doesn’t need to be shorter than the original integers... just the shortest possible way of writing it using only 2 symbols

2
  • In JavaScript, the only way to view an integer as binary would be to put it into a String of 1s and 0s. Each character of the string will take up at least 8 bits, meaning this string is huge in comparison to the memory required for the integer itself. Commented Jan 13, 2013 at 23:27
  • When you say binary do you mean literally and only the digits 0 and 1 as strings or do you mean that for example you could store 8 1's as the value 0XFF? Basically the same question/comment as @Paul but phrased differently. Commented Jan 13, 2013 at 23:27

3 Answers 3

5

If the values are evenly distributed between 0 and 399, then a pretty good encoding would be to take three values and encode them as a base 400 three-digit integer. I.e. val1 + 400*val2 + 400*400*val3. Then that integer will fit nicely in 26 bits. Four successive 26-bit values will fit in 13 bytes. Then you get an average of 13/12 bytes per value.

That's about as good as you're going to be able to do, unless the distribution of values is biased or if there is repetition or correlation, in which case you would be able to compress them more.

To deal with the details, you can use the number of bytes in the encoded sequence to determine the number of values, which may not be a multiple of three. If it is not a multiple of three, then there will be one or two values on the end, coded simply as nine bits each. Since it takes eight bits to go from 18 to 26 bits to add a value, there is no ambiguity in the count.

Sign up to request clarification or add additional context in comments.

1 Comment

You mean 18 bits for two values and 9 bits for 1 value, added to the end? I think your last paragraph could use a little more elaboration. Otherwise, +1 for an answer pretty much identical to what I was going to suggest. So few people seem to understand how to pack values into other number bases!
1

A good starting point would be to create constant-length blocks of ones and zeroes, which gives you easy to decode strings.

400 in binary is 110010000, which requires 9 characters to encode each number as its binary representation zero-padded to constant length.

encoding the first row:

var padTo9 = function( bin ){ 
    while( bin.length<9 ){ bin = "0" + bin; } 
    return bin; 
}
[128,32 ,22,23].map( function(i){ return padTo9( i.toString(2) ) }).join('');

/* result:
"010000000000100000000010110000010111"
*/

decoding

"010000000000100000000010110000010111".match(/[0-1]{9}/g).map( function(i){ return parseInt( i, 2 ) });
/* result:
[128, 32, 22, 23]
*/

I think the only way to get shorter string is using variable block length, which would require adding some control symbols to tell the decoder that following numbers are encoded in a specific number of characters. But these symbols have to be in >400 and still 9 characters long, so I think it wouldn't help given random distribution of data.

Comments

0

max 399: 2**9 is the smallest instance of (2**n)>=399, each number can be stored as 9 bits; convert each to binary, and concat

1 Comment

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.