Updating a dataframe based on another dataframe in Python

Question

I have a DataFrame , say df1, which has all the columns correct except the 'Employee' column. There is another DataFrame , say df2, which has correct Employee names but stored in the column 'Staff'. I am trying to update df1 based on 'key_df1' and 'key_df2' from the respective DataFrames. Need some help on how to approach this question. (Please see below the expected output in the image)

data1=[['NYC-URBAN','JON','$5000','yes','BANKING','AC32456'],['WDC-RURAL','XING','$4500','Yes','FINANCE','AD45678'],['LONDON-URBAN','EDWARDS','$3500','No','IT','DE43216'],
     ['SINGAPORE-URBAN','WOLF','$5000','No','SPORTS','RT45327'],['MUMBAI-RURAL','NEMBIAR','$2500','No','IT','Rs454457']]

data2=[['NYC','MIKE','BANKING','BIKING','AH56245'],['WDC','ALPHA','FINANCE','TREKKING','AD45678'],
     ['LONDON-URBAN','BETA','FINANCE','SLEEPING','DE43216'],['SINGAPORE','WOLF','SPORTS','DANCING','RT45307'],
     ['MUMBAI','NEMBIAR','IT','ZUDO','RS454453']]

List1=['City','Employee', 'Income','Travelling','Industry', 'Key_df1']
List2=['City','Staff','Industry','Hobby', 'Key_df1']

df1=pd.DataFrame(data1,columns=List1)
df2=pd.DataFrame(data2,columns=List2)

Expected Ouput:

Edit (Additional Query):

Thanks for the response. Along with the above question, I want to concatenate value of 'Employee' column with the 'Travelling' Column from df1 only for the rows in which the Key_df1 and Key_df2 ties in the two DataFrames. Please see below the second expected output.

Valdi_Bo · Accepted Answer · 2021-03-13 19:19:44Z

4

First set the index in df1 to Key_df1 and save it as a temporary DataFrame:

wrk = df1.set_index('Key_df1')

Then update (in-place) its Employee column using df2 with the index set to Key_df2, taking only Staff column:

wrk.Employee.update(df2.set_index('Key_df2').Staff)

And the last operation is to change the index to a "regular" column and move it to the previous location:

result = wrk.reset_index().reindex(columns=List1)

The result is:

              City Employee Income Travelling Industry   Key_df1
0        NYC-URBAN      JON  $5000        yes  BANKING   AC32456
1        WDC-RURAL    ALPHA  $4500        Yes  FINANCE   AD45678
2     LONDON-URBAN     BETA  $3500         No       IT   DE43216
3  SINGAPORE-URBAN     WOLF  $5000         No   SPORTS   RT45327
4     MUMBAI-RURAL  NEMBIAR  $2500         No       IT  Rs454457

Edit following the comment about Travelling column

Now just update is not enough and the task must be solved another way.

Start from joining df1 with df2.Staff (with set_index to join properly):

result = df1.join(df2.set_index('Key_df2').Staff, on='Key_df1')

The second step (the actual update) is:

result.Employee.where(result.Staff.isna(), result.Staff + '_' + result.Travelling,
    inplace=True)

And the last step is to drop Staff column (not necessary any more):

result.drop(columns=['Staff'], inplace=True)

The final result is:

              City   Employee Income Travelling Industry   Key_df1
0        NYC-URBAN        JON  $5000        yes  BANKING   AC32456
1        WDC-RURAL  ALPHA_Yes  $4500        Yes  FINANCE   AD45678
2     LONDON-URBAN    BETA_No  $3500         No       IT   DE43216
3  SINGAPORE-URBAN       WOLF  $5000         No   SPORTS   RT45327
4     MUMBAI-RURAL    NEMBIAR  $2500         No       IT  Rs454457

edited Mar 13, 2021 at 19:19

answered Mar 13, 2021 at 12:17

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ussu20 Over a year ago

Hi @Validi_Bo, thanks for the response. I am also trying to concatenate the updated 'Employee' column with 'Travelling' column. Could you pls help with this?

Ussu20 Over a year ago

I have added expected output in the question as well.

fsl · Accepted Answer · 2021-03-13 13:44:08Z

2

You can use Boolean Indexing, e.g.:

mask = df1.Key_df1 == df2.Key_df1.reindex(df1.index)
df1.loc[mask, 'Employee'] = df2.Staff

Output:

              City Employee Income Travelling Industry   Key_df1
0        NYC-URBAN      JON  $5000        yes  BANKING   AC32456
1        WDC-RURAL    ALPHA  $4500        Yes  FINANCE   AD45678
2     LONDON-URBAN     BETA  $3500         No       IT   DE43216
3  SINGAPORE-URBAN     WOLF  $5000         No   SPORTS   RT45327
4     MUMBAI-RURAL  NEMBIAR  $2500         No       IT  Rs454457

edited Mar 13, 2021 at 13:44

answered Mar 13, 2021 at 12:02

fsl

3,2801 gold badge12 silver badges21 bronze badges

6 Comments

Abhi_J Over a year ago

Hi your original answer with df1.Employee[mask] = df2.Staff seemed to work is there any reason you changed it to df1.loc[mask, 'Employee'] = df2.Staff ?

Abhi_J Over a year ago

ok thanks for the reply, first one was shorter, I liked it.

fsl Over a year ago

FWIW, you can do it in one line if you wish: df1.loc[df1.Key_df1 == df2.Key_df1, 'Employee'] = df2.Staff.

Ussu20 Over a year ago

@FelipeLanza In case df2 has different number of rows df1.Employee[mask] = df2.Staff this throws error. Is there a more generic approach because i had made up this a simple example. In real i have different rows in df2 than df1

fsl Over a year ago

Just re-index the smaller one with the index from the other. I've edited it.

|

Arkadiusz · Accepted Answer · 2021-03-13 12:07:45Z

1

You can also use numpy where:

import numpy as np

df1['Employee'] = np.where(df1['Key_df1'] == df2['Key_df1'], df2['Staff'], df1['Employee'])

edited Mar 13, 2021 at 12:07

answered Mar 13, 2021 at 12:04

Arkadiusz

1,8751 gold badge10 silver badges15 bronze badges

2 Comments

fsl Over a year ago

True, just bear in mind that isin is not the same as an equality check.

Arkadiusz Over a year ago

@FelipeLanza You are right. It can work in this case, but it would be risky to use it for a big dataframe. I've edited my answer.

Collectives™ on Stack Overflow

Updating a dataframe based on another dataframe in Python

3 Answers 3

Edit following the comment about Travelling column

2 Comments

6 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Edit following the comment about Travelling column

2 Comments

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related