7

How do I select rows that match values in multiple columns?

For example, we have the following df

k1 | k2 | v1 | v2
1  | 2  | 3  | 4
1  | 5  | 5  | 6
1  | 8  | 8  | 9

I am trying to select the middle row:

key_names = ["k1", "k2"]
keys = [1, 5]
selected_rows = df.loc[df[key_names].isin(keys)]

I get the following error:

ValueError: Cannot index with multidimensional key

The expected output is :

1  | 5  | 5  | 6

Thanks

1
  • df[(df[key_names] == keys).all(1)]. If you don't want exact ordering: df[df[key_names].isin(keys).all(1)] Commented Jul 25, 2019 at 19:30

2 Answers 2

13

TLDR

Use one of the following, based on your requirements:

df[(df[key_names] == keys).all(1)]

df[df[key_names].isin(keys).all(1)]

You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.

>>> df[key_names].isin(keys)
     k1     k2
0  True  False
1  True   True
2  True  False

You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.

>>> df[key_names].isin(keys).all(1)
0    False
1     True
2    False
dtype: bool

The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.

>>> df[key_names].isin([5, 1]).all(1)
0    False
1     True
2    False
dtype: bool

If you only want an exact ordering match, use == for broadcasted comparison, instead of isin

>>> (df[key_names] == keys).all(1)
0    False
1     True
2    False
dtype: bool

>>> (df[key_names] == [5, 1]).all(1)
0    False
1    False
2    False
dtype: bool

The last step here is using the 1D mask you've created to index the original DataFrame:

>>> df[(df[key_names] == keys).all(1)]
   k1  k2  v1  v2
1   1   5   5   6
Sign up to request clarification or add additional context in comments.

3 Comments

thank you for so detailed answer, it definitely seems like a correct way of doing it - e.g ordering and all. But I still have a problem of the same error "ValueError: Cannot index with multidimensional key" ... do you know why it may happen?
so I guess the error is caused by what is passed to loc, in particular the output of (df[key_names] == keys).all(1) has shape (3,) so maybe this is a problem coz it is treated as 2d..
yeah, I needed to first convert it to a list
0

Maybe df.query('k1 == 1 and k2 == 5') will be enough?

Or df[df.apply(lambda row: {1,5} == set((row.k1, row.k2)), axis=1)]

The second solution will work at any keys order.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.