Given a dict, where keys are column labels and values are Series, I can easily build a DataFrame as such:
dat = {'col_1': pd.Series([3, 2, 1, 0], index=range(10,50,10)),
'col_2': pd.Series([4, 5, 6, 7], index=range(10,50,10))}
pd.DataFrame(dat)
col_1 col_2
10 3 4
20 2 5
30 1 6
40 0 7
However I have a generator that gives (key, value) tuples:
gen = ((col, srs) for col, srs in dat.items()) # generator object
Now I can trivially use the generator to create a dict and make the same Dataframe:
pd.DataFrame(dict(gen))
However this evaluates all the generator Series first, and then sends them into Pandas, and so uses twice the memory (I presume). I'd like Pandas to iterate over the generator itself as it builds the DataFrame if possible.
I can pass the generator into the DataFrame constructor, but get an odd result:
gen = ((col, srs) for col, srs in dat.items()) # generator object
pd.DataFrame(gen)
0 1
0 col_1 10 3
20 2
30 1
40 0
dtype: int64
1 col_2 10 4
20 5
30 6
40 7
dtype: int64
And I get the same result using pd.DataFrame.from_dict(gen) or pd.DataFrame.from_records(gen).
So my questions are: Can I produce my original DataFrame by passing the generator gen to Pandas? And by doing so would I reduce my memory usage (assuming a large data set, not the trivial toy example shown here).
Thanks!
