2

I have two pandas dataframes:

DF1

index = np.arange('2020-01-01 00:00', '2020-01-01 00:04', dtype='datetime64[m]')
df = np.random.randint(100,500, size=(4,4))
columns =['Open','High','Low','Close']
df = pd.DataFrame(df, index=index, columns = columns)
df.index.name = 'Time'

                     Open  High  Low  Close
Time                                       
2020-01-01 00:00:00   266   397  177    475
2020-01-01 00:01:00   362   135  456    235
2020-01-01 00:02:00   315   298  296    493
2020-01-01 00:03:00   324   411  198    101

DF2

index = np.arange('2020-01-01 00:02', '2020-01-01 00:05', dtype='datetime64[m]')
df2 = np.random.randint(100,500, size=(3,4))
columns =['Open','High','Low','Close']
df2 = pd.DataFrame(df2, index=index, columns = columns)
df2.index.name = 'Time'

                     Open  High  Low  Close
Time                                       
2020-01-01 00:02:00   430   394  131    490
2020-01-01 00:03:00   190   211  394    359
2020-01-01 00:04:00   192   291  143    350

I need to merge both dataframes by the index (Time) and replace the column values of DF1 by the column values of DF2.

This is my expected output:

                     Open  High  Low  Close
Time                                       
2020-01-01 00:00:00   266   397  177    475 ->>>> Correspond to DF1
2020-01-01 00:01:00   362   135  456    235 ->>>> Correspond to DF1
2020-01-01 00:02:00   430   394  131    490 ->>>> Correspond to DF2
2020-01-01 00:03:00   190   211  394    359 ->>>> Correspond to DF2
2020-01-01 00:04:00   192   291  143    350 ->>>> Correspond to DF2

I have try several functions including merge or concat (concat([df1, df2], join="inner")) but with no success. Any help would be very appreciated. Thanks!

1 Answer 1

3

Try this:

df2.combine_first(df)
                     Open  High  Low  Close
Time                                       
2020-01-01 00:00:00   266   397  177    475
2020-01-01 00:01:00   362   135  456    235
2020-01-01 00:02:00   430   394  131    490
2020-01-01 00:03:00   190   211  394    359
2020-01-01 00:04:00   192   291  143    350

Because you mentioned pd.concat, here is how you could do it with that.

out = pd.concat([df, df2])
out = out[~out.index.duplicated(keep='last')]
print(out)
                     Open  High  Low  Close
Time                                       
2020-01-01 00:00:00   266   397  177    475
2020-01-01 00:01:00   362   135  456    235
2020-01-01 00:02:00   430   394  131    490
2020-01-01 00:03:00   190   211  394    359
2020-01-01 00:04:00   192   291  143    350

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.