Elegant way to replace values in pandas.DataFrame from another DataFrame

Question

I have a data frame that I want to replace the values in one column, with values from another dataframe.

df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
                   'value1': ["a","b","c","d","e","f","g","h"],
                   'value3': ["yes","no","yes","no","no","no","yes","no"]})

dfReplace = pd.DataFrame({'id2': [1001,1002],
                   'value2': ["rep1","rep2"]})

I need to use a groupby with common key and current solution is with a loop. Is there a more elegant (faster) way to do this with .map(apply) etc. I wanted initial to use pd.update(), but doesn't seem the correct way.

groups = dfReplace.groupby(['id2'])

for key, group in groups:
    df.loc[df['id1']==key,'value1']=group['value2'].values

Output

df
    id1   value1 value3
0   1001  rep1   yes
1   1002  rep2   no
2   1001  rep1   yes
3   1003  d      no
4   1004  e      no
5   1005  f      no
6   1002  rep2   yes
7   1006  h      no

i would recommend you to use @JohnE's solution as it much more elegant compared to mine — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Mar 12, 2016 at 20:00

MaxU - stand with Ukraine · Accepted Answer · 2016-03-12 17:19:55Z

6

try merge():

merge = df.merge(dfReplace, left_on='id1', right_on='id2', how='left')
print(merge)

merge.ix[(merge.id1 == merge.id2), 'value1'] = merge.value2
print(merge)

del merge['id2']
del merge['value2']
print(merge)

Output:

    id1 value1 value3   id2 value2
0  1001      a    yes  1001   rep1
1  1002      b     no  1002   rep2
2  1001      c    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002      g    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3   id2 value2
0  1001   rep1    yes  1001   rep1
1  1002   rep2     no  1002   rep2
2  1001   rep1    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002   rep2    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3
0  1001   rep1    yes
1  1002   rep2     no
2  1001   rep1    yes
3  1003      d     no
4  1004      e     no
5  1005      f     no
6  1002   rep2    yes
7  1006      h     no

answered Mar 12, 2016 at 17:19

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JohnE Over a year ago

OK, thanks. I will go ahead and post as an answer then, never hurts to give a couple alternatives.

MaxU - stand with Ukraine Over a year ago

sorry, i've tested another version - it seems to work properly ++

Rijul Over a year ago

this answer preserves indexes order + 1 vote. thanks

JohnE · Accepted Answer · 2016-03-12 20:08:53Z

4

This is a little cleaner if you already have the indexes set to id, but if not you can still do in one line:

>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
                               .combine_first(df.set_index('id1')))

     value1 value3
1001   rep1    yes
1001   rep1    yes
1002   rep2     no
1002   rep2    yes
1003      d     no
1004      e     no
1005      f     no
1006      h     no

If you separate into three lines and do the renaming and re-indexing separately, you can see that the combine_first() by itself is actually very simple:

>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )

>>> dfReplace.combine_first(df)

edited Mar 12, 2016 at 20:08

answered Mar 12, 2016 at 19:53

JohnE

30.7k9 gold badges86 silver badges116 bronze badges

6 Comments

MaxU - stand with Ukraine Over a year ago

I would also change: value3 --> value1, so it will update the column, OP was asking for. But the solution is very nice!

MaxU - stand with Ukraine Over a year ago

I meant: dfReplace.set_index('id2').rename( columns = {'value2':'value1'} ).combine_first(df.set_index('id1')), so the changes will be applied on 'value1' column

JohnE Over a year ago

@MaxU Yes, I mis-understood initially. Thanks!

MaxU - stand with Ukraine Over a year ago

Could you please explain why it's not working other way around: df.combine_first(dfReplace) - I thought it should work the same in both directions

JohnE Over a year ago

The data conflicts here, so the order determines whether df overwrites dfReplace or vice versa. Is that what you are asking?

|

Collectives™ on Stack Overflow

Elegant way to replace values in pandas.DataFrame from another DataFrame

2 Answers 2

3 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related