1

I have a Dataframe dfa in Pandas containing about 12103 rows with about 10 columns. Now I would build a new Dataframe dfb, exploiting dfa, where each row in dfb should be computed taking into account the first 300 rowsin dfa, for example:

 value1= dfa['one'].std()
 value2=dfa['one'].max()

obtaining dfb having 40 (12103/300) entries. Basically, the first row of dfb has two columns (e.g., value1, value2) containing values computed as above, the second row contains values computed starting from 301th row of dfa to 600th row.

thanks

1 Answer 1

2

IIUC, let's try this using groupby and stack:

 dfa.groupby(dfa.index // 300).apply(lambda x: pd.Series({'max':x.stack().max(),'std':x.stack().std()}))

MVCE:

dfa = pd.DataFrame(np.random.randint(1,100,(10,10)), columns=list('ABCDEFGHIJ'))
print(dfa)

Output:

    A   B   C   D   E   F   G   H   I   J
0  81  15  57  42  90  25  72  98   6   8
1  44  63  39  29  11   3  80  15  43  47
2  68  97  42  93  19  73  28  25   2  83
3  38  52  65  61  79  82  98  60  76  93
4  68  39  62  48  44  19  44  47  54  26
5  52  93  14  37  48  81   6  20  91  30
6  39  15  22  48  22   8  35  60  72  43
7  13  26  24  74  41  36  92  93  13  85
8   2  46  35  21  92  15  66  19  87  66
9  77  13  15  69   3  81  75  30  64  63

Create dfb in my example 2 rows at a time instead of 300 rows.

dfb = dfa.groupby(dfa.index // 2).apply(lambda x: pd.Series({'max':x.stack().max(),'std':x.stack().std()}))
print(dfb)

Output:

    max        std
0  98.0  29.754080
1  98.0  28.086521
2  93.0  24.203686
3  93.0  27.390884
4  92.0  30.153072
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.