0

I am wanting to check a pandas dataframe to see if two columns match two unique values. I know have to check one column at a time, but not two at once.

Basically, I want to see if the person's last name is 'Smith' and their first name is either 'John' or 'Tom' all at the same time.

My code:

import pandas as pd

# create dataframe
name = {'last_name': ['smith','smith','jones','parker'], 'first_name': ['john','tom','mary','peter']}
df = pd.DataFrame(name,columns=['last_name', 'first_name'])

# this is what I want to do
# df.loc[df['last_name'] == 'smith' and df['first_name'].isin(['john', 'tom']), 'match'] = 'yes'

# this works by itself
df.loc[df['last_name'] == 'smith', 'match'] = 'yes'

# this works by itself
df.loc[df['first_name'].isin(['john', 'tom']), 'match'] = 'yes'


print(df)
3
  • 1
    df.loc[(df['last_name'] == 'smith') & (df['first_name'].isin(['john', 'tom'])), 'match'] = 'yes' You need to use the bitwise and operator - i.e., &. Commented Jan 25, 2022 at 19:24
  • Another way: df['match'] = np.where((df['last_name'] == 'smith') & (df['first_name'].isin(['john', 'tom'])), 'yes', 'no') Commented Jan 25, 2022 at 19:26
  • 1
    Okay, yeah it looks like me not using & and enough parentheses is where I went wrong. Thank you! Commented Jan 25, 2022 at 21:14

1 Answer 1

1

You want to filter rows where the last name is "Smith" AND the first name is either "John" OR "Tom". This means it's either "John Smith" OR "Tom Smith". This is equivalent to

(last_name=="Smith" AND first_name=="John") OR (last_name=="Smith" AND first_name=="Tom")

which is equivalent to:

(last_name=="smith") AND (first_name=='john' OR first_name=='tom')

the latter OR can be handled using isin:

out = df[(df['last_name']=='smith') & (df['first_name'].isin(['john','tom']))]

Output:

  last_name first_name match
0     smith       john   yes
1     smith        tom   yes
Sign up to request clarification or add additional context in comments.

7 Comments

This did the trick! I think I just didn't have enough parentheses in mine! Also not using & instead of and. Thanks!
In addition to this, is it possible to do this as an actual if statement or is this the best way to handle this? if df.loc[(df['last_name'] == 'smith') & (df['first_name'].isin(['john', 'tom']))]: vs if df["match"].isin(["yes"]).any():
@Cole if you're filtering rows, using boolean indexing is the best way; but you can write an if condition like: (x['last_name']=='smith') & (x['first_name'] in ['john','tom']). In this case you're evaluating row by row
Basically, I am wanting to check if any rows exist where first_name = john OR tom AND last_name = smith. And if there are rows then execute this other code, if not continue on as normal. And I will have multiple checks like that looking for specific last_name and first_name combinations. I am not sure on the most efficient way to do this though.
@Cole sounds like you could write all of these conditions in a list and use np.select
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.