Aggregating data in Pandas python

Question

I have a Dataframe dfa in Pandas containing about 12103 rows with about 10 columns. Now I would build a new Dataframe dfb, exploiting dfa, where each row in dfb should be computed taking into account the first 300 rowsin dfa, for example:

 value1= dfa['one'].std()
 value2=dfa['one'].max()

obtaining dfb having 40 (12103/300) entries. Basically, the first row of dfb has two columns (e.g., value1, value2) containing values computed as above, the second row contains values computed starting from 301th row of dfa to 600th row.

thanks

Scott Boston · Accepted Answer · 2017-06-05 12:48:02Z

IIUC, let's try this using groupby and stack:

 dfa.groupby(dfa.index // 300).apply(lambda x: pd.Series({'max':x.stack().max(),'std':x.stack().std()}))

MVCE:

dfa = pd.DataFrame(np.random.randint(1,100,(10,10)), columns=list('ABCDEFGHIJ'))
print(dfa)

Output:

    A   B   C   D   E   F   G   H   I   J
0  81  15  57  42  90  25  72  98   6   8
1  44  63  39  29  11   3  80  15  43  47
2  68  97  42  93  19  73  28  25   2  83
3  38  52  65  61  79  82  98  60  76  93
4  68  39  62  48  44  19  44  47  54  26
5  52  93  14  37  48  81   6  20  91  30
6  39  15  22  48  22   8  35  60  72  43
7  13  26  24  74  41  36  92  93  13  85
8   2  46  35  21  92  15  66  19  87  66
9  77  13  15  69   3  81  75  30  64  63

Create dfb in my example 2 rows at a time instead of 300 rows.

dfb = dfa.groupby(dfa.index // 2).apply(lambda x: pd.Series({'max':x.stack().max(),'std':x.stack().std()}))
print(dfb)

Output:

    max        std
0  98.0  29.754080
1  98.0  28.086521
2  93.0  24.203686
3  93.0  27.390884
4  92.0  30.153072

Collectives™ on Stack Overflow

Aggregating data in Pandas python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related