1

How can I create a dataframe from a list of dictionaries that contain list of rows for each key? Please check example below:

>>> import pandas as pd
>>> rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
>>> rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
>>> rec_set_all = [rec_set1, rec_set2]
>>> df = pd.DataFrame.from_records(rec_set1)
>>> df
   col1  col2 col3
0     1     5    x
1     2     3    y
2     3     4    z

All good so far.
Now I try to append rec_set2 and this is what happens:

>>> df = df.append(rec_set2, ignore_index=True)
>>> df
        col1        col2       col3
0          1           5          x
1          2           3          y
2          3           4          z
3  [5, 6, 7]  [-4, 6, 2]  [p, q, r]
  1. Not what I was expecting. What append function should I use ?

  2. And rather than doing it in a loop, is there a simple one-line way to create the entire dataframe from rec_set_all ?

4
  • 3
    pd.concat([pd.DataFrame(rec_set1), pd.DataFrame(rec_set2)])? Commented Jan 8, 2020 at 21:08
  • wouldn't this df = df.append(pd.DataFrame(rec_set2), ignore_index=True) work? as in you just forgot to turn the other dictionary into a dataframe? Commented Jan 8, 2020 at 21:09
  • Not what I was expecting. Really? Have you looked at the docs for .append()? Commented Jan 8, 2020 at 21:22
  • I forgot to add: Where is this data coming from? Odds are we can avoid this issue entirely. Commented Jan 8, 2020 at 21:29

2 Answers 2

2

Assuming you are starting out with a list of dictionaries of lists, you can start by using list comprehension to turn it into a list of DataFrames:

rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
... (etc.)
rec_setn = {...}
rec_set_all = [rec_set1, rec_set2,...,rec_setn]

df_list = [pd.DataFrame(r) for r in rec_set_all]

Next, you can use the simple pd.concat method do combine it all into one DataFrame:

df_all = pd.concat(df_list)

If you want to reset the indexes so that it is coninuous rather than 0,1,2,0,1,2,etc., you can use this to renumber them all from 0:

df.reset_index(inplace=True,drop=True)

The result from your example would be:

    col1 col2 col3
0    1    5     x
1    2    3     y
2    3    4     z
3    5   -4     p
4    6    6     q
5    7    2     r

Edit

Including info from the comment from AMC, it can be written as a one-liner:

df = pd.concat([pd.DataFrame(r) for r in rec_set_all], ignore_index = True)
Sign up to request clarification or add additional context in comments.

5 Comments

pandas.concat() has an ignore_index parameter, so you can probably avoid having to do the .reset_index().
It might be more efficient to use a generator expression instead: df = pd.concat((pd.DataFrame(r) for r in rec_set_all), ignore_index = True)
df = pd.concat([pd.DataFrame(r) for r in rec_set_all], ignore_index = True) works perfectly ! Although it looks like there is no simple pd.DataFrame(rec_set_all) type of call that does the iteration internally for you.
@deeSo How did you end up in this situation? Surely there has to a be a better way of doing things, no?
@AMC I used a simple example to provide clarity. I am actually working on a larger data set - multiple input files each having json in the format described in rec_set1
0

This will also work. Just append the new dict as a DataFrame.

rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
rec_set_all = [rec_set1, rec_set2]
df = pd.DataFrame(rec_set1)

# append as rec_set2 as a DataFrame
df.append(pd.DataFrame(rec_set2))

1 Comment

It's better to concatenate than to append repeatedly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.