12

With pandas.DataFrame.resample I can downsample a DataFrame:

df.resample("3s", how="mean")

This resamples a data frame with a datetime-like index such that all values within 3 seconds are aggregated into one row. The values of the columns are averaged.

Question: I have a data frame with multiple columns. Is it possible to specify a different aggregation function for different columns, e.g. I want to "sum" column x, "mean" column y and pick the "last" for column z? How can I achieve that effect?

I know I could create a new empty data frame, and then call resample three times, but I would prefer a faster in-place solution.

2 Answers 2

31

You can use .agg after resample. With a dictionary, you can aggregate different columns with various functions.

Try this:

df.resample("3s").agg({'x':'sum','y':'mean','z':'last'})

Also, how is deprecated:

C:\Program Files\Anaconda3\lib\site-packages\ipykernel__main__.py:1: FutureWarning: how in .resample() is deprecated the new syntax is .resample(...).mean()

Sign up to request clarification or add additional context in comments.

3 Comments

Here we are using each column only once. what if I want to apply two functions on the same column x and a different function for the other column z.
@KathirmaniSukumar You can use a list to hold all the function you want to do on a single variable. df.resample('3s').agg({'X':['sum','mean'],'Y':'max','Z':['min','std']})
When I try this I get the following warning: "FutureWarning: using a dict with renaming is deprecated and will be removed in a future version" Why does it think I'm renaming columns when I'm just trying to tell it how to aggregate the given columns?
6

Consider the dataframe df

np.random.seed([3,1415])
tidx = pd.date_range('2017-01-01', periods=18, freq='S')
df = pd.DataFrame(np.random.rand(len(tidx), 3), tidx, list('XYZ'))
print(df)

                            X         Y         Z
2017-01-01 00:00:00  0.444939  0.407554  0.460148
2017-01-01 00:00:01  0.465239  0.462691  0.016545
2017-01-01 00:00:02  0.850445  0.817744  0.777962
2017-01-01 00:00:03  0.757983  0.934829  0.831104
2017-01-01 00:00:04  0.879891  0.926879  0.721535
2017-01-01 00:00:05  0.117642  0.145906  0.199844
2017-01-01 00:00:06  0.437564  0.100702  0.278735
2017-01-01 00:00:07  0.609862  0.085823  0.836997
2017-01-01 00:00:08  0.739635  0.866059  0.691271
2017-01-01 00:00:09  0.377185  0.225146  0.435280
2017-01-01 00:00:10  0.700900  0.700946  0.796487
2017-01-01 00:00:11  0.018688  0.700566  0.900749
2017-01-01 00:00:12  0.764869  0.253200  0.548054
2017-01-01 00:00:13  0.778883  0.651676  0.136097
2017-01-01 00:00:14  0.544838  0.035073  0.275079
2017-01-01 00:00:15  0.706685  0.713614  0.776050
2017-01-01 00:00:16  0.542329  0.836541  0.538186
2017-01-01 00:00:17  0.185523  0.652151  0.746060

Use agg

df.resample('3S').agg(dict(X='sum', Y='mean', Z='last'))

                            X         Y         Z
2017-01-01 00:00:00  1.760624  0.562663  0.777962
2017-01-01 00:00:03  1.755516  0.669204  0.199844
2017-01-01 00:00:06  1.787061  0.350861  0.691271
2017-01-01 00:00:09  1.096773  0.542220  0.900749
2017-01-01 00:00:12  2.088590  0.313316  0.275079
2017-01-01 00:00:15  1.434538  0.734102  0.746060

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.