1

Suppose I have the DataFrame below:

>>> dfrm = pandas.DataFrame({
                             "A":[1,2,3], 
                             "id1":[True, True, False], 
                             "id2":[False, True, False]
                            })

>>> dfrm
   A    id1    id2
0  1   True  False
1  2   True   True
2  3  False  False

How can I flatten the two Boolean columns into a new column (that possibly will cause rows of the DataFrame to need to be repeated), such that in the new column, the entries for all of the True occurrences appear.

Specifically, in the example above, I would want the output to look like this:

index A   id1    id2   all_ids
    0 1  True  False       id1
    1 2  True   True       id1
    1 2  True   True       id2
    2 3 False  False       NaN

(preferably not multi-indexed on all_ids but I would take that if it was the only way to do it).

I've commonly seen this as "wide to long" and the inverse (going from one column to a bunch of Booleans) as "long to wide".

Is there any built-in support for this in Pandas?

1 Answer 1

2

Off-hand I can't recall a function that does this in pandas as a one-liner, but you can do something like this:

In [35]: st = dfrm.ix[:, ['id1', 'id2']].stack()

In [36]: all_ids = Series(st.index.get_level_values(1), 
                          st.index.get_level_values(0),
                          name='all_ids')[st.values]

In [37]: dfrm.join(all_ids, how='left')
Out[37]: 
   A    id1    id2 all_ids
0  1   True  False     id1
1  2   True   True     id1
1  2   True   True     id2
2  3  False  False     NaN
Sign up to request clarification or add additional context in comments.

1 Comment

I like this approach, let me test it a bit and get back to you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.