replace dataframe values using another dataframe, without index matching

Question

I would like to selectively overwrite values in a dataframe using another dataframe using a column that is not the index of either dataframe. I can solve this problem by temporarily switching the index columns around, but I feel like there has to be a better/more efficient way. Searching here on SE and elsewhere was not fruitful.

example data

Note a couple key points:

df2 has more rows than are required, and those extra rows should not be used
the values of 'B' are not in the same order in the two dfs
The existing indices don't match. The whole point of my question is that matching on existing indices should not be used.

Code:

df1 = pd.DataFrame({
    'A':['lorem','ipsum','dolor','sit'],
    'B':[1,2,3,4],
    'C':[30,40,5000,6000]})

df2 = pd.DataFrame({
    'B':[4,3,5,6],
    'C':[60,50,70,80]})


df1:
   A      B    C
0  lorem  1    30
1  ipsum  2    40
2  dolor  3    5000
3  sit    4    6000


df2:
   B    C
0  4    60
1  3    50
2  5    70
3  6    80

my desired output

   A      B    C
0  lorem  1    30
1  ipsum  2    40
2  dolor  3    50
3  sit    4    60

my non-ideal solution

# save indices and columns for both dfs, then re-index both
col_order1 = df1.columns
old_index1 = df1.index # not needed in my example, but needed in generalized case
df1.set_index('B', inplace=True)

col_order2 = df2.columns
old_index2 = df2.index 
df2.set_index('B', inplace=True)

# value substitution based on the new indices
df1.loc[df1.index.isin(df2.index), 'C'] = df2['C']

# undo the index changes to df1 and df2
df1.reset_index(inplace=True)
df1 = df1[col_order1]
df1.index = old_index1

df2.reset_index(inplace=True)
df2 = df2[col_order2]
df2.index = old_index2

Clearly this works, but I am new to Pandas and I feel like I am missing knowledge of some built-in method to do what I describe.

How can I achieve the desired result without having to shuffle those indices around?

@QuangHoang yes, I have looked these up in the docs. I wouldn't be asking if it was as easy for me as "RTFM". If the solution is that trivial to you, why not answer the question? As it stands, your response is not terribly useful. — DocBuckets
– DocBuckets, Commented Nov 23, 2020 at 19:30

James_SO · Accepted Answer · 2020-11-23 19:56:43Z

1

I would merge and combine_first()

newDF = df1.merge(df2,
         left_on="B",
         right_on="B",
         how='left', 
         suffixes=["", "_df2"])

newDF["C"] = newDF["C_df2"].combine_first(newDF["C"]).apply(int)
print(newDF[["A","B","C"]])

       A  B   C
0  lorem  1  30
1  ipsum  2  40
2  dolor  3  50
3    sit  4  60

Notes:

specifying suffixes is desirable when you have the same column name in each side of the join just to keep things easy to read - I use an empty suffix for the left side
I used .apply(int) there because the merge generates NaN values where the join key from df1 is not present in df2. If I recall correctly, presence of NaN in a column of integers converts the column to floats.

edited Nov 23, 2020 at 19:56

answered Nov 23, 2020 at 19:49

James_SO

1,38711 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DocBuckets Over a year ago

Although I wish there was a cleaner way to do this, your method works exactly as intended. A simple function definition could turn this into a single, short line of code for easy implementation in my projects.

Collectives™ on Stack Overflow

replace dataframe values using another dataframe, without index matching

example data

my desired output

my non-ideal solution

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

example data

my desired output

my non-ideal solution

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related