2

I have two dataframes. I concat them to make one. The problem is, while troubleshooting the code, I will the same concat code multiple times. This produces the dataframe with repeated rows as many times I do the concat. I want to prevent it.

My code:

rdf = pd.DataFrame({'A':[10,20]},index=pd.date_range(start='2020-05-04 08:00:00', freq='1h', periods=2))
df2 = pd.DataFrame({'A':[30,40]},index=pd.date_range(start='2020-05-04 10:00:00', freq='1h', periods=2))

# Run it first time
rdf= pd.concat([rdf,df2])
# First time result
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

# Run it second time
rdf= pd.concat([rdf,df2])
# second time result produces duplicates
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

My solution: My approach is right a new line code and drop duplicates by keeping the first.

rdf= pd.concat([rdf,df2])
rdf.drop_duplicates(keep='first',inplace=True)
rdf
                      A
2020-05-04 08:00:00  10
2020-05-04 09:00:00  20
2020-05-04 10:00:00  30
2020-05-04 11:00:00  40

Is there a better approach? I mean, can we prevent this while concating? so, no need to write extra line code for dropping the duplicates.

2
  • Is there a reason you're concating df2 to the end of rdf twice? Won't this always end up with all the rows of df2 duplicated at the end of rdf? Commented May 9, 2021 at 22:54
  • @HenryEcker I have shown here for demo. Actually, I run the code multiple times while troubleshooting as explained in my q. Commented May 9, 2021 at 22:57

1 Answer 1

2

Then let us try combine_first

rdf = rdf.combine_first(df2)
rdf = rdf.combine_first(df2)
rdf
Out[115]: 
                        A
2020-05-04 08:00:00  10.0
2020-05-04 09:00:00  20.0
2020-05-04 10:00:00  30.0
2020-05-04 11:00:00  40.0
Sign up to request clarification or add additional context in comments.

4 Comments

Really appreciate for introducing this. I heard combine_first for the first time here. Thanks a ton. How different is this with concat? Can we use this all the time?
@Mainland concat , is append , combine_first , is look up the index , adding the missing one from df2 to df1 ~
One more question: can I use combine_first all the time and just discard the concat?
@Mainland if you need update the existing index and update the new index, you can do combine_first, if you want to append whatever the new df have you need concat , for your case answer is yes

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.