Select rows that match values in multiple columns in pandas

Question

How do I select rows that match values in multiple columns?

For example, we have the following df

k1 | k2 | v1 | v2
1  | 2  | 3  | 4
1  | 5  | 5  | 6
1  | 8  | 8  | 9

I am trying to select the middle row:

key_names = ["k1", "k2"]
keys = [1, 5]
selected_rows = df.loc[df[key_names].isin(keys)]

I get the following error:

ValueError: Cannot index with multidimensional key

The expected output is :

1  | 5  | 5  | 6

Thanks

df[(df[key_names] == keys).all(1)]. If you don't want exact ordering: df[df[key_names].isin(keys).all(1)] — user3483203
– user3483203, Commented Jul 25, 2019 at 19:30

user3483203 · Accepted Answer · 2019-07-25 19:35:27Z

13

TLDR

Use one of the following, based on your requirements:

df[(df[key_names] == keys).all(1)]

df[df[key_names].isin(keys).all(1)]

You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.

>>> df[key_names].isin(keys)
     k1     k2
0  True  False
1  True   True
2  True  False

You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.

>>> df[key_names].isin(keys).all(1)
0    False
1     True
2    False
dtype: bool

The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.

>>> df[key_names].isin([5, 1]).all(1)
0    False
1     True
2    False
dtype: bool

If you only want an exact ordering match, use == for broadcasted comparison, instead of isin

>>> (df[key_names] == keys).all(1)
0    False
1     True
2    False
dtype: bool

>>> (df[key_names] == [5, 1]).all(1)
0    False
1    False
2    False
dtype: bool

The last step here is using the 1D mask you've created to index the original DataFrame:

>>> df[(df[key_names] == keys).all(1)]
   k1  k2  v1  v2
1   1   5   5   6

answered Jul 25, 2019 at 19:35

user3483203

51.3k10 gold badges72 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

YohanRoth Over a year ago

thank you for so detailed answer, it definitely seems like a correct way of doing it - e.g ordering and all. But I still have a problem of the same error "ValueError: Cannot index with multidimensional key" ... do you know why it may happen?

YohanRoth Over a year ago

so I guess the error is caused by what is passed to loc, in particular the output of (df[key_names] == keys).all(1) has shape (3,) so maybe this is a problem coz it is treated as 2d..

YohanRoth Over a year ago

yeah, I needed to first convert it to a list

Valdi_Bo · Accepted Answer · 2019-07-25 20:05:39Z

0

Maybe df.query('k1 == 1 and k2 == 5') will be enough?

Or df[df.apply(lambda row: {1,5} == set((row.k1, row.k2)), axis=1)]

The second solution will work at any keys order.

edited Jul 25, 2019 at 20:05

answered Jul 25, 2019 at 19:57

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Collectives™ on Stack Overflow

Select rows that match values in multiple columns in pandas

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related