Memory-efficient DataFrame.stack()

If I memray the following code, df.stack() allocates 22MB, when the df is only 5MB.

import numpy as np
import pandas as pd

columns = list('abcdefghijklmnopqrstuvwxyz')
df = pd.DataFrame(np.random.randint(0,100,size=(1000, 26*26)), columns=pd.MultiIndex.from_product([columns, columns]))
print(df.memory_usage().sum()) # 5408128, ~5MB
df.stack() # reshape: (1000,26*26) -> (1000*26,26)

Why DataFrame.stack() consumes so much memory? It allocates 30% on dropna and remaining 70% re-allocating the underlying array 3 times to reshape. Shall I move to native numpy.reshape or is there anything I can do to make it slimmer?

asked Jan 3, 2023 at 17:42

C. Claudio

3862 silver badges18 bronze badges

3

why do you perform a stack? how you are going to use this stacked dataframe after?

Ben.T
– Ben.T

2023-01-03 18:20:56 +00:00
Commented Jan 3, 2023 at 18:20
It is meaningful from a data perspective, i.e. it is a timeseries of square matrices - in this case it is an API requirement shape=(Mutilindex(time,columns), columns). I agree that the unstack, flatten version with 26*26 cols is computationally handier

C. Claudio
– C. Claudio

2023-01-03 22:23:23 +00:00
Commented Jan 3, 2023 at 22:23

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Memory-efficient DataFrame.stack()

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest