46

I have a file where the first byte contains encoded information. In Matlab I can read the byte bit by bit with var = fread(file, 8, 'ubit1'), and then retrieve each bit by var(1), var(2), etc.

Is there any equivalent bit reader in python?

11 Answers 11

36

Read the bits from a file, low bits first.

def bits(f):
    bytes = (ord(b) for b in f.read())
    for b in bytes:
        for i in xrange(8):
            yield (b >> i) & 1

for b in bits(open('binary-file.bin', 'r')):
    print b
Sign up to request clarification or add additional context in comments.

3 Comments

Tested this (btw the byte is little endian) and ord('\x04') returns 4 which should return the bit string '0000100' using your code i get '000100000'
It gives low bits first (which is natural, since it also gives low bytes first). But if you want the other order, you can change xrange(8) to reversed(xrange(8)).
Tested against the matlab code reading the file and your code correctly returns the same bit string from the data file. the byte converted to a bit string is '00100000' not sure why the conversion in Daniel G's answer is off since it makes sense.
34

The smallest unit you'll be able to work with is a byte. To work at the bit level you need to use bitwise operators.

x = 3
#Check if the 1st bit is set:
x&1 != 0
#Returns True

#Check if the 2nd bit is set:
x&2 != 0
#Returns True

#Check if the 3rd bit is set:
x&4 != 0
#Returns False

3 Comments

Do you mind adding more info, since the OP clearly seems like a beginner?
Sure I'm coming from a matlab background and can't find a 'ubit1' typecode for python. I've used the following: f=open('filename','rb') var=f.read(1) which returns var as the hex value string '\x04' how do i get the binary representation of the string?
Thank you for this answer. I foolishly never thought of thinking about it like that; I'm still too stuck thinking about things in base 10. This makes so much sense, though.
15

With numpy it is easy like this:

Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)

More info here:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html

Comments

13

You won't be able to read each bit one by one - you have to read it byte by byte. You can easily extract the bits out, though:

f = open("myfile", 'rb')
# read one byte
byte = f.read(1)
# convert the byte to an integer representation
byte = ord(byte)
# now convert to string of 1s and 0s
byte = bin(byte)[2:].rjust(8, '0')
# now byte contains a string with 0s and 1s
for bit in byte:
    print bit

6 Comments

Tried it and for the example where byte='\0x04' the code above returns '0b'
Thanks your code now gives byte=100 which is the correct base 2 representation of ord('\0x04')=4 but shouldn't the byte read be '00000100'
Sure, I'll add that really quickly (the problem is that it truncates leading zeros).
I realize I can pad the bits to get the representation once i have the binary value but it just seems odd that I can't read the bits directly.
This is a limitation of Python - it reads the entire byte at once. Then, when you call bin(), it gives you the smallest possible representation of that number in binary (that is, with the fewest possible bits, rather than using any standard like 8 or 32 bits). If you want all eight bits of each byte, you need to pad it again after calling bin().
|
13

Joining some of the previous answers I would use:

[int(i) for i in "{0:08b}".format(byte)]

For each byte read from the file. The results for an 0x88 byte example is:

>>> [int(i) for i in "{0:08b}".format(0x88)]
[1, 0, 0, 0, 1, 0, 0, 0]

You can assign it to a variable and work as per your initial request. The "{0.08}" is to guarantee the full byte length

Comments

7

To read a byte from a file: bytestring = open(filename, 'rb').read(1). Note: the file is opened in the binary mode.

To get bits, convert the bytestring into an integer: byte = bytestring[0] (Python 3) or byte = ord(bytestring[0]) (Python 2) and extract the desired bit: (byte >> i) & 1:

>>> for i in range(8): (b'a'[0] >> i) & 1
... 
1
0
0
0
0
1
1
0
>>> bin(b'a'[0])
'0b1100001'

Comments

2

There are two possible ways to return the i-th bit of a byte. The "first bit" could refer to the high-order bit or it could refer to the lower order bit.

Here is a function that takes a string and index as parameters and returns the value of the bit at that location. As written, it treats the low-order bit as the first bit. If you want the high order bit first, just uncomment the indicated line.

def bit_from_string(string, index):
       i, j = divmod(index, 8)

       # Uncomment this if you want the high-order bit first
       # j = 8 - j

       if ord(string[i]) & (1 << j):
              return 1
       else:
              return 0

The indexing starts at 0. If you want the indexing to start at 1, you can adjust index in the function before calling divmod.

Example usage:

>>> for i in range(8):
>>>       print i, bit_from_string('\x04', i)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0

Now, for how it works:

A string is composed of 8-bit bytes, so first we use divmod() to break the index into to parts:

  • i: the index of the correct byte within the string
  • j: the index of the correct bit within that byte

We use the ord() function to convert the character at string[i] into an integer type. Then, (1 << j) computes the value of the j-th bit by left-shifting 1 by j. Finally, we use bitwise-and to test if that bit is set. If so return 1, otherwise return 0.

1 Comment

Got it! thanks for the detail in your comment I looked at the bit shift operators but couldn't see how it worked for this. Your answer helps clarify the bitwise operators and the approach. Thanks
1

Supposing you have a file called bloom_filter.bin which contains an array of bits and you want to read the entire file and use those bits in an array.

First create the array where the bits will be stored after reading,

from bitarray import bitarray
a=bitarray(size)           #same as the number of bits in the file

Open the file, using open or with, anything is fine...I am sticking with open here,

f=open('bloom_filter.bin','rb')

Now load all the bits into the array 'a' at one shot using,

f.readinto(a)

'a' is now a bitarray containing all the bits

2 Comments

you have to install bitarray module first: pip install bitarray
one thing i'd like to point out about this approach is that if it is an extremely large file it could lead to hitting memory limits. just stuff to think about
0

This is pretty fast I would think:

import itertools
data = range(10)
format = "{:0>8b}".format
newdata = (False if n == '0' else True for n in itertools.chain.from_iterable(map(format, data)))
print(newdata) # prints tons of True and False

Comments

0

I think this is a more pythonic way:

a = 140
binary = format(a, 'b')

The result of this block is:

'10001100'

I was to get bit planes of the image and this function helped me to write this block:

def img2bitmap(img: np.ndarray) -> list:
    if img.dtype != np.uint8 or img.ndim > 2:
        raise ValueError("Image is not uint8 or gray")
    bit_mat = [np.zeros(img.shape, dtype=np.uint8) for _ in range(8)]
    for row_number in range(img.shape[0]):
        for column_number in range(img.shape[1]):
            binary = format(img[row_number][column_number], 'b')
            for idx, bit in enumerate("".join(reversed(binary))[:]):
                bit_mat[idx][row_number, column_number] = 2 ** idx if int(bit) == 1 else 0
    return bit_mat

Also by this block, I was able to make primitives image from extracted bit planes

img = cv2.imread('test.jpg', cv2.IMREAD_GRAYSCALE)
out = img2bitmap(img)
original_image = np.zeros(img.shape, dtype=np.uint8)
for i in range(original_image.shape[0]):
    for j in range(original_image.shape[1]):
        for data in range(8):
            x = np.array([original_image[i, j]], dtype=np.uint8)
            data = np.array([data], dtype=np.uint8)
            flag = np.array([0 if out[data[0]][i, j] == 0 else 1], dtype=np.uint8)
            mask = flag << data[0]
            x[0] = (x[0] & ~mask) | ((flag[0] << data[0]) & mask)
            original_image[i, j] = x[0]

Comments

0

You can use ReverseBox
For example if you want to get 3 bits on position 5 from number 2273, you can write something like this:

from reversebox.io_files.bytes_helper_functions import get_bits
result = get_bits(2273, 3, 5)
print(result)

Result:

7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.