1

I have 2 data frames df1 and df2

df1 = pd.DataFrame({'ID': [1, 2, 3, 5], 
                    'Name': ['client', 'detail_client', 'operations', audit],
                    'Type': ['str', 'var', 'str', 'nvar']})
df2 = pd.DataFrame({'ID': [5, 3, 7, 2], 
                    'Name': ['audit', 'operations', 'C', 'detail_client'],
                    'Type': ['nan', 'nan', 'nan', 'nan']})

I would like to create a function that takes as arguments df1, df2, df1['ID'], df2['ID'], df1['Name'], df2['Name'], df1['Type'] and df2['Type'] because columns label may not always identical.
For each row of df1. Iterate over df2. Compare df1['ID'] value with df2['ID'] and df1['Name'] value with df2['Name']. When true. Set df2['Type']=df1['Type']. The function should return df2 with df2['Type'] equal to df1['Type'] when the condition is true. I expect df2 to be like the following df:

df2 = pd.DataFrame({'ID': [5, 3, 7, 2], 
                    'Name': ['audit', 'operations', 'nan', 'detail_client'],
                    'Type': ['nvar', 'str', 'nan', 'var']})

Any help is welcomed. Thanks in advance.

2 Answers 2

1

You could use a function something like this, which borrows its method from this answer:

def update_columns(df1, df2, match_cols, merge_cols):
    res = df1.set_index(match_cols)
    res.update(df2.set_index(match_cols)[merge_cols])
    return res.reset_index()

df1 = pd.DataFrame({'ID': [1, 2, 3, 5], 
                    'Name': ['client', 'detail_client', 'operations', 'audit'],
                    'Type': ['str', 'var', 'str', 'nvar']})
df2 = pd.DataFrame({'ID': [5, 3, 7, 2], 
                    'Name': ['audit', 'operations', 'C', 'detail_client'],
                    'Type': ['nan', 'nan', 'nan', 'nan']})

out = update_columns(df2, df1, ['ID', 'Name'], ['Type'])

Output:

   ID           Name  Type
0   5          audit  nvar
1   3     operations   str
2   7              C   nan
3   2  detail_client   var
Sign up to request clarification or add additional context in comments.

Comments

1

You can merge the two dataframes on ID and Name:

merged = df2.merge(df1, on=["ID", "Name"], how="left")
mask = merged["Type_y"].notna()
df2.loc[mask, "Type"] = merged.loc[mask, "Type_y"]

print(df2)

Prints:

   ID           Name  Type
0   5          audit  nvar
1   3     operations   str
2   7              C   nan
3   2  detail_client   var

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.