If I memray the following code, df.stack() allocates 22MB, when the df is only 5MB.
import numpy as np
import pandas as pd
columns = list('abcdefghijklmnopqrstuvwxyz')
df = pd.DataFrame(np.random.randint(0,100,size=(1000, 26*26)), columns=pd.MultiIndex.from_product([columns, columns]))
print(df.memory_usage().sum()) # 5408128, ~5MB
df.stack() # reshape: (1000,26*26) -> (1000*26,26)
Why DataFrame.stack() consumes so much memory? It allocates 30% on dropna and remaining 70% re-allocating the underlying array 3 times to reshape. Shall I move to native numpy.reshape or is there anything I can do to make it slimmer?
stack? how you are going to use this stacked dataframe after?