2

I have to replace values from one dataframe with values from another dataframe.

Example bellow works, but I have extra steps in order to replace values in "first" column with values from "new" column and than drop "new" column.

In [1]: import pandas as pd                                                                                                  

In [2]: df = pd.DataFrame([['A', 'X'], 
   ...:                    ['B', 'X'], 
   ...:                    ['C', 'X'], 
   ...:                    ['A', 'Y'], 
   ...:                    ['B', 'Y'], 
   ...:                    ['C', 'Y'], 
   ...:                    ], columns=['first', 'second'])                                                                   

In [3]: df                                                                                                                   
Out[3]: 
  first second
0     A      X
1     B      X
2     C      X
3     A      Y
4     B      Y
5     C      Y

In [4]: df_tt = pd.DataFrame([['A', 'E'], 
   ...:                       ['B', 'F'], 
   ...:                      ], columns=['orig', 'new'])                                                                     

In [5]: df_tt                                                                                                                
Out[5]: 
  orig new
0    A   E
1    B   F

In [6]: df = df.merge(df_tt, left_on='first', right_on='orig')                                                               

In [7]: df                                                                                                                   
Out[7]: 
  first second orig new
0     A      X    A   E
1     A      Y    A   E
2     B      X    B   F
3     B      Y    B   F

In [8]: df['first'] = df['new']                                                                                              

In [9]: df                                                                                                                   
Out[9]: 
  first second orig new
0     E      X    A   E
1     E      Y    A   E
2     F      X    B   F
3     F      Y    B   F

In [10]: df.drop(columns=['orig', 'new'])                                                                                    
Out[10]: 
  first second
0     E      X
1     E      Y
2     F      X
3     F      Y

I would like to replace values with no extra steps.

2 Answers 2

3

Another solution is using replace:

# Restrict to common entries
df = df[df['first'].isin(df_tt['orig'])]

# Use df_tt as a mapping to replace values in df

df['first'] = df['first'].replace(df_tt.set_index('orig').to_dict()['new'])

Solution very similar to @jezrael, but I like the idea of explicitly using replace, because this is actually what you are doing: replacing values in one dataframe based on another dataframe.

Sign up to request clarification or add additional context in comments.

1 Comment

Your soulution as well as jezrael's is good. Thanks guys. Actually your solution works without "to_dict" too: df['first'] = df['first'].replace(df_tt.set_index('orig')['new'])
2

Use isin for filtering with boolean indexing and then map:

df = (df[df['first'].isin(df_tt['orig'])]
         .assign(first=lambda x: x['first'].map(df_tt.set_index('orig')['new'])))
print (df)
  first second
0     E      X
1     F      X
3     E      Y
4     F      Y

Alternative:

df = df[df['first'].isin(df_tt['orig'])]
df['first'] = df['first'].map(df_tt.set_index('orig')['new'])

2 Comments

Is it as fast as merge? Merge is quite fast.
@user3225309 - It is faster, the best test it with real data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.