Efficient bitswapping in python3

Question

I have a very hard time handling bitswapping in python3.

So far I found fast bitswapping algorithms in C here and here, but I wasn't able to properly translate that into python3, because handling the data, using the correct data types and not getting confused by different encodings was impossible for me.

The only solution, that worked for me was using BitArray to do the swapping like this:

with open(file_swapped, 'wb') as out, open(file, 'rb') as inp:
    byte_in = inp.read(1)
    while byte_in:
        tmp = bitstring.BitArray(byte_in)
        tmp.reverse()
        byte_out = tmp._getbytes()
        byte_in = inp.read(1)

However this algorithm takes more than 2 Minutes to process the data that needs to be bitswapped. Profiling of this algorithm showed, that the creation of an BitArray takes most of the total time.

Every attempt to convert the binary input data to strings of '0' and '1' or integers, to do the "swapping"-part manually, failed, because the data has no specific encoding (utf-8 / utf-16 didn't work)

Here is an example of my Input Data

Does anyone know a fast way, to do the task described above?

Martin Evans · Accepted Answer · 2015-10-19 13:13:11Z

6

You could try something like the following approach. This uses Python's bytes.maketrans() and translate() functions.

def reverse_bits(x):
    return (int('{:08b}'.format(x)[::-1], 2))

reverse_table = bytes.maketrans(bytes(range(0,256)), bytes(reverse_bits(x) for x in range(0, 256)))

with open('input.txt', 'rb') as f_input, open('output.txt', 'wb') as f_output:
    data = f_input.read()
    f_output.write(data.translate(reverse_table))

This first creates a translate table and then applies it at once to the contents of the whole input file.

edited Oct 19, 2015 at 13:13

answered Oct 19, 2015 at 12:45

Martin Evans

46.9k17 gold badges88 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jmm Over a year ago

Thank you very much. This perfectly solved the bitswapping problem!

Louis Ricci Over a year ago

This seems like the most pythonic way to do a lookup table. I'll add, if the file is especially large you may have to buffer the reads (~16KB at a time) and translate() the chunks, so you don't run out of memory trying to read in the whole file at once.

Kijewski · Accepted Answer · 2015-10-19 11:54:14Z

1

Do you really want do this in Python? Just use the efficient C solution and subprocess.check_call:

with open(file_swapped, 'wb') as out, open(file, 'rb') as inp:
    check_call(['path_to_your_c_solution'], stdin=inp, stdout=out)

answered Oct 19, 2015 at 11:54

Kijewski

26.1k14 gold badges108 silver badges149 bronze badges

Comments

Peter de Rivaz · Accepted Answer · 2015-10-19 12:18:10Z

1

I suggest you prepare an array with 256 entries specifying how to swap each byte, and then use string.translate to convert the input.

You can then use this to operate on the entire file in a single call rather than operating byte by byte so I suspect it will be a lot faster than your current method.

answered Oct 19, 2015 at 12:18

Peter de Rivaz

33.6k4 gold badges51 silver badges80 bronze badges

Comments

Serge Ballesta · Accepted Answer · 2015-10-19 12:51:15Z

As you only need to reverse single bytes in Python 2, you could just use the lookup table from the referenced answer:

BitReverseTable256 = [ chr(b) for b in
[ 0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
  0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
  0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
  0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
  0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
  0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
  0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
  0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
  0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF ]]

with open(file_swapped, 'wb') as out, open(file, 'rb') as inp:
    byte_in = inp.read(1)
    while byte_in:
        byte_out = BitReverseTable256[ord(byte_in)]
        # use byte_out ...
        byte_in = inp.read(1)

Louis Ricci · Accepted Answer · 2015-10-19 12:50:39Z

-1

Use a lookup table array() to map a byte value to it's bit reversed version.

from array import array
lut = array("B", [0, 128, 64, 192, 32, 160, 96, 224, 16, 144, 80, 208, 48, 176, 112, 240, 8, 136, 72, 200, 40, 168, 104, 232, 24, 152, 88, 216, 56, 184, 120, 248, 4, 132, 68, 196, 36, 164, 100, 228, 20, 148, 84, 212, 52, 180, 116, 244, 12, 140, 76, 204, 44, 172, 108, 236, 28, 156, 92, 220, 60, 188, 124, 252, 2, 130, 66, 194, 34, 162, 98, 226, 18, 146, 82, 210, 50, 178, 114, 242, 10, 138, 74, 202, 42, 170, 106, 234, 26, 154, 90, 218, 58, 186, 122, 250, 6, 134, 70, 198, 38, 166, 102, 230, 22, 150, 86, 214, 54, 182, 118, 246, 14, 142, 78, 206, 46, 174, 110, 238, 30, 158, 94, 222, 62, 190, 126, 254, 1, 129, 65, 193, 33, 161, 97, 225, 17, 145, 81, 209, 49, 177, 113, 241, 9, 137, 73, 201, 41, 169, 105, 233, 25, 153, 89, 217, 57, 185, 121, 249, 5, 133, 69, 197, 37, 165, 101, 229, 21, 149, 85, 213, 53, 181, 117, 245, 13, 141, 77, 205, 45, 173, 109, 237, 29, 157, 93, 221, 61, 189, 125, 253, 3, 131, 67, 195, 35, 163, 99, 227, 19, 147, 83, 211, 51, 179, 115, 243, 11, 139, 75, 203, 43, 171, 107, 235, 27, 155, 91, 219, 59, 187, 123, 251, 7, 135, 71, 199, 39, 167, 103, 231, 23, 151, 87, 215, 55, 183, 119, 247, 15, 143, 79, 207, 47, 175, 111, 239, 31, 159, 95, 223, 63, 191, 127, 255])
for i in range(256):
    print(i)
    print(lut[i])

This should be more than efficient enough, using C for a simple task like this is silly.

from array import array
lut = array("B", [0, 128, 64, 192, 32, 160, 96, 224, 16, 144, 80, 208, 48, 176, 112, 240, 8, 136, 72, 200, 40, 168, 104, 232, 24, 152, 88, 216, 56, 184, 120, 248, 4, 132, 68, 196, 36, 164, 100, 228, 20, 148, 84, 212, 52, 180, 116, 244, 12, 140, 76, 204, 44, 172, 108, 236, 28, 156, 92, 220, 60, 188, 124, 252, 2, 130, 66, 194, 34, 162, 98, 226, 18, 146, 82, 210, 50, 178, 114, 242, 10, 138, 74, 202, 42, 170, 106, 234, 26, 154, 90, 218, 58, 186, 122, 250, 6, 134, 70, 198, 38, 166, 102, 230, 22, 150, 86, 214, 54, 182, 118, 246, 14, 142, 78, 206, 46, 174, 110, 238, 30, 158, 94, 222, 62, 190, 126, 254, 1, 129, 65, 193, 33, 161, 97, 225, 17, 145, 81, 209, 49, 177, 113, 241, 9, 137, 73, 201, 41, 169, 105, 233, 25, 153, 89, 217, 57, 185, 121, 249, 5, 133, 69, 197, 37, 165, 101, 229, 21, 149, 85, 213, 53, 181, 117, 245, 13, 141, 77, 205, 45, 173, 109, 237, 29, 157, 93, 221, 61, 189, 125, 253, 3, 131, 67, 195, 35, 163, 99, 227, 19, 147, 83, 211, 51, 179, 115, 243, 11, 139, 75, 203, 43, 171, 107, 235, 27, 155, 91, 219, 59, 187, 123, 251, 7, 135, 71, 199, 39, 167, 103, 231, 23, 151, 87, 215, 55, 183, 119, 247, 15, 143, 79, 207, 47, 175, 111, 239, 31, 159, 95, 223, 63, 191, 127, 255])
with open(file_swapped, 'wb') as out, open(file, 'rb') as inp:
    byte_in = inp.read(1)
    while byte_in:
        byte_out = lut[byte_in]
        byte_in = inp.read(1)

answered Oct 19, 2015 at 12:50

Louis Ricci

21.1k5 gold badges50 silver badges64 bronze badges

4 Comments

Kijewski Over a year ago

"using C for a simple task like this is silly", so you do know how big the input though it is not stated the the question?

Louis Ricci Over a year ago

@Kay - since the accepted answer is a purely python solution I'll reaffirm that it's "silly" (read: unnecessary, if you'd like) to answer a question tagged with python and python-3.x with 'use C instead'. I'll add, voting down provably correct answers because you feel slighted is just laughably petty and against SO policy of only voting down answers that are 'not useful or incorrect'.

Kijewski Over a year ago

I down voted the word "silly". OP stating that their initial solution takes 2 minutes hints that the input must be quite big, doesn't it? So iterating over the single chars is not a good solution. Also I down voted your answer because IMO it shows the least effort (explaining "what" and "why") of the four pure Python answers.

Louis Ricci Over a year ago

@Kay - Since you read the part about the OPs code taking two minutes to run you should of also noticed the following sentence "Profiling of this algorithm showed, that the creation of an BitArray takes most of the total time." While reading a file one byte at a time is generally consideed inefficient (especially in lower level languages like C), the underlying IO in Python will buffer ~4KB at a time of the file,so this really isn't a concern and looking at your answer seems like an afterthought / rationalization. Finally, I'll refer you to wikipedia en.wikipedia.org/wiki/Lookup_table

Collectives™ on Stack Overflow

Efficient bitswapping in python3

5 Answers 5

2 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related