I'm using the Pandas Python library to compare two dataframes, each consisting of a column of dates and two columns of values. One of the dataframes, call it LongDF, consists of more dates than the other, call it ShortDF. Both dataframes are indexed by the date using pandas.tseries.index.DatetimeIndex See below (I've shortened both up just to demonstrate).
LongDF
╔════════════╦════════╦════════╗
║ Date ║ Value1 ║ Value2 ║
╠════════════╬════════╬════════╣
║ 1990-03-17 ║ 6.84 ║ 1.77 ║
║ 1990-03-18 ║ 0.99 ║ 7.00 ║
║ 1990-03-19 ║ 4.90 ║ 8.48 ║
║ 1990-03-20 ║ 2.57 ║ 2.41 ║
║ 1990-03-21 ║ 4.10 ║ 8.33 ║
║ 1990-03-22 ║ 8.86 ║ 1.31 ║
║ 1990-03-23 ║ 6.01 ║ 6.22 ║
║ 1990-03-24 ║ 0.74 ║ 1.69 ║
║ 1990-03-25 ║ 5.56 ║ 7.30 ║
║ 1990-03-26 ║ 8.05 ║ 1.67 ║
║ 1990-03-27 ║ 8.87 ║ 8.22 ║
║ 1990-03-28 ║ 9.00 ║ 6.83 ║
║ 1990-03-29 ║ 1.34 ║ 6.00 ║
║ 1990-03-30 ║ 1.69 ║ 0.40 ║
║ 1990-03-31 ║ 8.71 ║ 3.26 ║
║ 1990-04-01 ║ 4.05 ║ 4.53 ║
║ 1990-04-02 ║ 9.75 ║ 4.79 ║
║ 1990-04-03 ║ 7.74 ║ 0.44 ║
╚════════════╩════════╩════════╝
ShrotDF
╔════════════╦════════╦════════╗
║ Date ║ Value1 ║ Value2 ║
╠════════════╬════════╬════════╣
║ 1990-03-25 ║ 1.98 ║ 3.92 ║
║ 1990-03-26 ║ 3.37 ║ 3.40 ║
║ 1990-03-27 ║ 2.93 ║ 7.93 ║
║ 1990-03-28 ║ 2.35 ║ 5.34 ║
║ 1990-03-29 ║ 1.41 ║ 7.62 ║
║ 1990-03-30 ║ 9.85 ║ 3.17 ║
║ 1990-03-31 ║ 9.95 ║ 0.35 ║
║ 1990-04-01 ║ 4.42 ║ 7.11 ║
║ 1990-04-02 ║ 1.33 ║ 6.47 ║
║ 1990-04-03 ║ 6.63 ║ 1.78 ║
╚════════════╩════════╩════════╝
What I'd like to do is reference the data occurring on the same day in each dataset, put data from both sets into one formula and, if it's greater than some number, paste the date and values into another dataframe.
I assume I should use something like for row in ShortDF.iterrows(): to iterate through each date on ShortDF but I can't figure out how to select the corresponding row on LongDF, using the DatetimeIndex.
Any help would be appreciated
merged = df.merge(df1, on='Date')thenmerged.apply(myfunc, axis=1)ormerged.apply(lambda row: myfunc(row), axis=1)I'd need to see your function first though before deciding the best approach, also it's getting late here in blighty so I may not answermerged[merged[['Value1','Value2']].max(axis=1) > my_val]this will return the highest values for each row that are higher than your threshold value. When performing the merge you may get duplicated columns where Value1 from both dfs don't match, by default they will have suffix_xand_y, you can rename or not care seeing as you just want the highest valuemerged=ShortDF.merge(LongDF, on='Date'). Am I understanding that properly?