Here is an example:
# Generate some random time series dataframe with 'price' and 'volume'
x = pd.date_range('2017-01-01', periods=100, freq='1min')
df_x = pd.DataFrame({'price': np.random.randint(50, 100, size=x.shape), 'vol': np.random.randint(1000, 2000, size=x.shape)}, index=x)
df_x.head(10)
price vol
2017-01-01 00:00:00 56 1544
2017-01-01 00:01:00 70 1680
2017-01-01 00:02:00 92 1853
2017-01-01 00:03:00 94 1039
2017-01-01 00:04:00 81 1180
2017-01-01 00:05:00 70 1443
2017-01-01 00:06:00 56 1621
2017-01-01 00:07:00 68 1093
2017-01-01 00:08:00 59 1684
2017-01-01 00:09:00 86 1591
# Here is some example aggregate function:
df_x.resample('5Min').agg({'price': 'mean', 'vol': 'sum'}).head()
price vol
2017-01-01 00:00:00 78.6 7296
2017-01-01 00:05:00 67.8 7432
2017-01-01 00:10:00 76.0 9017
2017-01-01 00:15:00 74.0 6989
2017-01-01 00:20:00 64.4 8078
However, if I want to extract other aggregated info depends on more than one column, what can I do?
For example, I want to append 2 more columns here, called all_up and all_down.
These 2 columns' calculations are defined as follows:
In every 5 minutes, how many times the 1-minute sampled price went down and vol went down, call this column all_down, and how many times they are went up, call this column all_up.
Here is what I expect the 2 columns look like:
price vol all_up all_down
2017-01-01 00:00:00 78.6 7296 2 0
2017-01-01 00:05:00 67.8 7432 0 0
2017-01-01 00:10:00 76.0 9017 1 0
2017-01-01 00:15:00 74.0 6989 1 1
2017-01-01 00:20:00 64.4 8078 0 2
This functionality depends on 2 columns. But in the agg function in the Resampler object, it seems that it only accept 3 kinds of functions:
- a
stror a function that applies to each of the columns separately. - a
listof functions that applies to each of the columns separately. - a
dictwith keys matches the column names. Still only apply the value which is a function to a single column each time.
All these functionalities seem doesn't meet my needs.
all_upcolumn counts in every 5 minutes, how many 1-minute price goes up and vol also goes up.all_downthe opposite way.priceandvoldata are random integers, that's probably we get different values? Actually I only manually calculated the first 10 minutesall_upandall_downs.