0

I got two dataframe: df1 has around 20000 rows and df2 has 7226 rows as shown in picture:

df1: enter image description here

df2:

enter image description here

I want to copy the 'Final Coding' value of df2 to 'Final Coding' value of df1 if both dataframes have same Review_Text values. I used following command:

df1.loc[df1['Review_Text'].isin(df2['Review_Text'],'Final Coding'] = df2['Final Coding']

The problem with above statement is that it does not replace all values of df1. It only replaces values until the index number of 7226 of df1 and after that it does not replace any value. I tried another command as follow:

mask = df1['Review_Text'].isin(df2['Review_Text'])==True
df1['Final Coding'] = df1['Final Coding'].where(~mask,df2['Final Coding'],axis=0)

But ended up with same problem. I searched the exisiting threads of stackoverflow and found another way. So I changed the command as follow:

df1.loc[df1['Review_Text'].isin(df2.set_index('Review_Text').index),'Final Coding'] = df2.set_index('Review_Text')['Final Coding']

However, above command threw following error:

ValueError: cannot reindex from a duplicate axis

There are some text in 'Review_Text' column which is duplicated that's why it threw this error. I am not sure how to overcome this issue. Could anyone help me in fixing the issue?

3
  • unique up on it? Commented Dec 23, 2021 at 0:42
  • 1
    @Abel Sorry, I do not understand??? Commented Dec 23, 2021 at 0:49
  • sneaky joke. sounded like you wanted to turn df2 into a lookup but it had multiple occurrences of the key, value pairs you wanted to use. could make a df of your keys (and values) followed by running unique and pull pertinent ones to be used for lookup. or could do something like stackoverflow.com/a/67343461/14237276 to reduce it by grouping. Commented Dec 23, 2021 at 1:56

1 Answer 1

1

You can do

Update before below :

df1.reset_index(inplace=True, drop=True)
df2.reset_index(inplace=True, drop=True)

#==================================
m = df1['Review_Text'].isin(df2['Review_Text'])
v = df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding'])
df1.loc[m,'Final Coding'] = v

Or we can do

df1.update(df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding']).to_frame('Review_Text'))
Sign up to request clarification or add additional context in comments.

2 Comments

thanks for posting. When I ran the df1.update command or second line of first solution, I got the error message "InvalidIndexError: Reindexing only valid with uniquely valued Index objects"
@user2293224 you have duplicated index , try reset_index(), I will update

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.