Pandas DataFrame select rows based on values of multiple columns whose names are specified in a list

Question

I have the following dataframe:

import pandas as pd
import numpy as np
ds = pd.DataFrame({'z':np.random.binomial(n=1,p=0.5,size=10), 
                   'x':np.random.binomial(n=1,p=0.5,size=10), 
                   'u':np.random.binomial(n=1,p=0.5,size=10), 
                   'y':np.random.binomial(n=1,p=0.5,size=10)})
ds

    z   x   u   y
0   0   1   0   0
1   0   1   1   1
2   1   1   1   1
3   0   0   1   1
4   0   0   1   1
5   0   0   0   0
6   1   0   1   1
7   0   1   1   1
8   1   1   0   0
9   0   1   1   1

How do I select rows that have the values (0,1) for variable names specified in a list?

This is what I have thus far:

zs = ['z','x']
tf = ds[ds[zs].values == (0,1)]
tf

Now that prints:

    z   x   u   y
0   0   1   0   0
0   0   1   0   0
1   0   1   1   1
1   0   1   1   1
2   1   1   1   1
3   0   0   1   1
4   0   0   1   1
5   0   0   0   0
7   0   1   1   1
7   0   1   1   1
8   1   1   0   0
9   0   1   1   1
9   0   1   1   1

Which shows duplicates and also has incorrect row (row #2 - 1,1,1,1). Any thoughts or ideas? Of course I am assuming there is a pythonic way of doing this without nested loops and brute-forcing it.

cs95 · Accepted Answer · 2019-01-21 23:34:27Z

5

You can use broadcasted numpy comparison:

df[(df[['z','x']].values == [0, 1]).all(1)]

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

You can also use np.logical_and.reduce:

cols = ['z', 'x']
vals = [0, 1]

df[np.logical_and.reduce([df[c] == v for c, v in zip(cols, vals)])]

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

Lastly, assuming your column names are compatible, dynamically generate query expression strings for use with query:

querystr = ' and '.join([f'{c} == {v!r}' for c,  v in zip(cols, vals)])
df.query(querystr)

   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

Where {v!r} is the same as {repr(v)}.

answered Jan 21, 2019 at 23:34

cs95

406k106 gold badges744 silver badges797 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

vsm Over a year ago

Thank you @coldspeed. It's very helpful. Do you know why my simplistic query gives the wrong answer (duplicates and an erroneous row)?

cs95 Over a year ago

@vsm You were close. You need a 1D mask to index df. So, with your solution, you should have done (df[['z', 'x']].values == (0, 1)).all(axis=1) to see which row satisfied this condition for all columns. That's why this was my first option—to show you your fixed code.

YOLO · Accepted Answer · 2019-01-21 23:32:43Z

1

You can do:

cols = ['u','x']
bools = ds[cols].apply(lambda x: all(x == (0,1)), axis=1)
ds[bools]

   u  x  y  z
0  0  1  1  1
7  0  1  0  1
8  0  1  1  0

answered Jan 21, 2019 at 23:32

YOLO

22k5 gold badges25 silver badges42 bronze badges

Comments

BENY · Accepted Answer · 2019-01-21 23:38:53Z

1

Using eq , and very similar to cold's numpy method

df[df[zs].eq(pd.Series([0,1],index=zs),1).all(1)]
   z  x  u  y
0  0  1  0  0
1  0  1  1  1
7  0  1  1  1
9  0  1  1  1

answered Jan 21, 2019 at 23:38

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

LonelyDaoist · Accepted Answer · 2019-01-21 23:50:12Z

0

A simpler way is to use boolean indexing:

f = ds['z'] == 0
g = ds['x'] == 1
ds[f & g]

edited Jan 21, 2019 at 23:50

answered Jan 21, 2019 at 23:43

LonelyDaoist

7241 gold badge9 silver badges22 bronze badges

1 Comment

cs95 Over a year ago

"simple" but does not scale for multiple columns and values. See np.logical_and.reduce on how to generalise it (also, see the second option of my answer).

Collectives™ on Stack Overflow

Pandas DataFrame select rows based on values of multiple columns whose names are specified in a list

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related