2

I have two dataframes, one with more columns (but two of them are important) and the second with two columns. df1:

col_a  col_b
101     104
102     201
103     301
505     601

df2:

col_a_a col_b_b
420      637
425      643
201      701
601      702

I need the next thing. Lookup at df1['col_b'] and if exist in df2['col_a_a'] then replace it with df2['col_b_b'].

I have tried two ways:

df1['col_b'] = np.where(df1['col_b'] == df2['col_a_a'], df2['col_b_b'], df1['col_b'])

df1.col_b[df1.col_b == df2.col_a_a] = df2.col_b_b

But both of them return me the same error: ValueError: Can only compare identically-labeled Series objects

Desired output:

col_a  col_b
    101     104
    102     701
    103     301
    505     702

2 Answers 2

2

Use replace by Series created by set_index:

df1['col_b'] = df1['col_b'].replace(df2.set_index('col_a_a')['col_b_b'])
print (df1)
   col_a  col_b
0    101    104
1    102    701
2    103    301
3    505    702
Sign up to request clarification or add additional context in comments.

Comments

1

Another way, using pd.Series.map:

df1['col_b'] = df1['col_b'].map(df2.set_index('col_a_a')['col_b_b']).fillna(df1['col_b'])

Depending on your data, this may be more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.