1

I have an operation that outputs a series of tuples like [('a',1.0), ('c', 2.5)]. It does this for a lot of inputs, so the outputs would look like

[('a',1.0), ('c', 2.5)]
[('b',1.5), ('c', 2.5)]
[('a', 5.0), ('b',1.5), ('c', 2.75)]

which should output a dataframe that looks like

>>> df
     a     b     c
0    1.0   NaN   2.5
1    NaN   1.5   2.5
2    5.0   1.5   2.75

However, the column names are not known beforehand, so at some point the data generation, I could end up with some ('z',12.0).

I think the simplest way would be to create a dataframe for each row and concatenate the dataframes:

df_list = []
for row in rows:
     tuple_result = f(row)
     df_list.append(pd.DataFrame(...)) # generate a single-row dataframe
df = pd.concat(df_list, axis=0, ignore_index=True)

and this will take care of all the NaNs and column names. However, I will be doing this for several rows and I think this approach will be unnecessarily memory-intensive.

Is there a better way to do this?

1 Answer 1

3

Your can use a list comprehension, converting each row of tuples into a dictionary.

my_data = [
    [('a',1.0), ('c', 2.5)],
    [('b',1.5), ('c', 2.5)],
    [('a', 5.0), ('b',1.5), ('c', 2.75)]
]

>>> pd.DataFrame([dict(row) for row in my_data])
     a    b     c
0  1.0  NaN  2.50
1  NaN  1.5  2.50
2  5.0  1.5  2.75

Timings

%timeit pd.DataFrame([dict(row) for row in my_data * 100000])
# 559 ms ± 92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit pd.DataFrame(map(dict, my_data * 100000))
# 438 ms ± 25.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
df_list = []
for row in my_data * 100000:
     df_list.append(pd.DataFrame(dict(row), index=[0])) 
df = pd.concat(df_list, axis=0, ignore_index=True, sort=False)
# 6min 11s ± 1min 54s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Sign up to request clarification or add additional context in comments.

2 Comments

In recent version of pandas, you might pass map objs also pd.DataFrame(map(dict, my_data)), I think it gets so clean ;p
Yes, that appears to be more efficient and potentially more readable.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.