0

I have two data frames; df1 has Id and sendDate and df2 has Id and actDate. The two df's are not the same shape - df2 is a lookup table. There may be multiple instances of Id.

ex.

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                     "sendDate": ["2019-09-24", "2020-09-11", "2018-01-06", "2018-01-06", "2019-09-24"]})

df2 = pd.DataFrame({"Id": [1, 2, 2],
                     "actDate": ["2019-09-24", "2019-09-24", "2020-09-11"]})

I want to add a boolean True/False in df1 to find when df1.Id == df2.Id and df1.sendDate == df2.actDate.

Expected output would add a column to df1:

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                         "sendDate": ["2019-09-24", "2020-09-11", "2018-01-06", "2018-01-06", "2019-09-24"],
"Match?": [True, False, False, False, True]})

I'm new to python from R, so please let me know what other info you may need.

1
  • 2
    Can you include what your expected output would be? Commented Apr 29, 2021 at 1:08

4 Answers 4

1

Use isin and boolean indexing

import pandas as pd

df1 = pd.DataFrame({"Id": [1, 1, 2, 3, 2],
                    "sendDate": ["2019-09-24", "2020-09-11",
                                 "2018-01-06", "2018-01-06",
                                 "2019-09-24"]})

df2 = pd.DataFrame({"Id": [1, 2, 2],
                    "actDate": ["2019-09-24", "2019-09-24", "2020-09-11"]})

df1['Match'] = (df1['Id'].isin(df2['Id'])) & (df1['sendDate'].isin(df2['actDate']))
print(df1)

Output:

   Id    sendDate  Match
0   1  2019-09-24   True
1   1  2020-09-11   True
2   2  2018-01-06  False
3   3  2018-01-06  False
4   2  2019-09-24   True
Sign up to request clarification or add additional context in comments.

Comments

1

The .isin() approaches will find values where the ID and date entries don't necessarily appear together (e.g. Id=1 and date=2020-09-11 in your example). You can check for both by doing a .merge() and checking when df2's date field is not null:

df1['match'] = df1.merge(df2, how='left', left_on=['Id', 'sendDate'], right_on=['Id', 'actDate'])['actDate'].notnull()

Comments

1

A vectorized approach via numpy -

import numpy as np
df1['Match'] = np.where((df1['Id'].isin(df2['Id'])) & (df1['sendDate'].isin(df2['actDate'])),True,False)

Comments

0

You can use .isin():

df1['id_bool'] = df1.Id.isin(df2.Id)
df1['date_bool'] = df1.sendDate.isin(df2.actDate)

Check out the documentation here.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.