2

I have a large Pandas dataframe (a subclass of Numpy ndarray for most purposes) containing binary strings (0s and 1s). I need to find the positions of all the zeros in these strings and then label them. Also, I expect the positions of the zeros to be relatively sparse (~1% of all bit positions).

Basically, I want to run something like this:

import pandas as pd
x = pd.Series([ '11101110', '11111101' ], ) # start with strings
x = pd.Series([ 0b11101110, 0b11111101 ], ) # ... or integers of a known bit length

zero_positions = find_zero_positions( x )

Yielding zero_positions =...

         value
row bit
0   4        0
    0        0
1   1        0

I've tried a few different ways to do this, but haven't come up with anything better than looping through one row at a time. (EDIT: The actual strings I want to look at are much longer than the 8-bit examples here, so a lookup table won't work.)

I'm not sure whether it will be more efficient to approach this as a string problem (Pandas's Vectorized string methods don't offer a substring-position-finding method) or a numeric problem (using something like numpy.unpackbits, maybe?).

2
  • If you're just looking for zero bits in bytes, why not use a lookup table? Commented Jun 10, 2013 at 23:15
  • Good point, @gnibbler. Actually, the input strings I really want to use are much longer (128 bits) making a lookup table impractical. Commented Jun 10, 2013 at 23:23

4 Answers 4

2

You could use numpy.unpackbits as follows, starting with an ndarray of this form:

In [1]: x = np.array([[0b11101110], [0b11111101]], dtype=np.uint8)

In [2]: x
Out[2]:
array([[238],
       [253]], dtype=uint8)

In [3]: df = pd.DataFrame(np.unpackbits(x, axis=1))

In [4]: df.columns = df.columns[::-1]

In [5]: df
Out[5]:
   7  6  5  4  3  2  1  0
0  1  1  1  0  1  1  1  0
1  1  1  1  1  1  1  0  1

Then from the DataFrame, just stack and find the zeros:

In [6]: s = df.stack()

In [7]: s.index.names = ['row', 'bit']

In [8]: s[s == 0]
Out[8]:
row  bit
0    4      0
     0      0
1    1      0
dtype: uint8

I think this would be a reasonably efficient method.

Sign up to request clarification or add additional context in comments.

2 Comments

This works well, although numpy.unpackbits only seems to work on 8-bit integers. I'm not sure about the best way to store 128-bit integers in ndarray, much less to convert them to 8-bit chunks (using np.frombuffer introduces some issues with endianness). Now I'm thinking that my real problem may be more with how I get these large bit-strings into Pandas, rather than how I parse them down afterwards...
@Dan That's annoying, it made for an elegant solution (for uint8s)... :s
1

One good solution would be to split the input into smallish chunks and use that in a memoized lookup table (where you compute the first time through).

E.g., if each number/array is 128 bits; break it into eight 16-bits parts that are looked up in a table. At worst, the lookup table needs 216 ~ 65536 entries - but if zeros are very sparse (e.g., at most two zeros in any group of 8 bits only need about ~64). Depending on how sparse you can beef up the size of the chunk.

Comments

1

In the "yuck" department, I would like to enter the following contestant:

def numpyToBinString(numpyValue):
    return "".join( [str((numpyValue[0] >> shiftLength) & 1 ) for shiftLength in range(numpyValue.dtype.itemsize * 8)] )

Works for shape (,) ndArrays, but could be extended with @vectorize decorator.

1 Comment

As a followup, it looks like you want to be playing with the bitArray package.
0

You can use a lookup table.

Create a table that has the 0 positions for each number from 0-255 and a function to access it, call it zeroBitPositions, this returns a list.

Then, assuming that you are storing your numbers as a python long type (which, I believe has unlimited precision). You can do the following:

allZeroPositions = []
shift = 0
while (num >> shift) > 0:
    zeroPositions += [x + shift for x in zeroBitPositions ((num >> shift) & 0xFF)]
    shift += 8

Hopefully this is a good start.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.