I have an operation that outputs a series of tuples like
[('a',1.0), ('c', 2.5)]. It does this for a lot of inputs, so the outputs would look like
[('a',1.0), ('c', 2.5)]
[('b',1.5), ('c', 2.5)]
[('a', 5.0), ('b',1.5), ('c', 2.75)]
which should output a dataframe that looks like
>>> df
a b c
0 1.0 NaN 2.5
1 NaN 1.5 2.5
2 5.0 1.5 2.75
However, the column names are not known beforehand, so at some point the data generation, I could end up with some ('z',12.0).
I think the simplest way would be to create a dataframe for each row and concatenate the dataframes:
df_list = []
for row in rows:
tuple_result = f(row)
df_list.append(pd.DataFrame(...)) # generate a single-row dataframe
df = pd.concat(df_list, axis=0, ignore_index=True)
and this will take care of all the NaNs and column names. However, I will be doing this for several rows and I think this approach will be unnecessarily memory-intensive.
Is there a better way to do this?