I have a Pandas dataframe where one column contains a non-nested json object in each row.
js
0 {"k1":"1","k2":"A","k3":"X"}
1 {"k1":"2","k2":"B","k3":"X"}
2 {"k1":"3","k2":"A","k3":"Y"}
3 {"k1":"4","k2":"D","k4":"M"}
Created like this:
import pandas as pd
L0 = ['{"k1":"1","k2":"A","k3":"X"}',
'{"k1":"2","k2":"B","k3":"X"}',
'{"k1":"3","k2":"A","k3":"Y"}',
'{"k1":"4","k2":"D","k4":"M"}']
df = pd.DataFrame({'js':L0})
I want to make the json-objects into their own dataframe:
k1 k2 k3 k4
0 1 A X NaN
1 2 B X NaN
2 3 A Y NaN
3 4 D NaN M
Right now the only way I know is by using the json module and df.iterrows():
import json
all_json = []
for _,row in df.iterrows():
all_json.append(json.loads(row["js"]))
df2 = pd.DataFrame.from_dict(all_json)
Is there a better way to do this, ideally without iterating?
EDIT 1:
Thanks for the answers.
I have timed the three suggested approaches using ast.literal_eval on my real world data where my own approach takes 158 ms ± 4.01 ms:
df = df.apply(lambda x: ast.literal_eval(x[0]), 1).apply(pd.Series) takes 640 ms ± 7.8 ms
df['js'].apply(ast.literal_eval).apply(pd.Series) takes 636 ms ± 19 ms
pd.DataFrame(df.js.apply(ast.literal_eval).tolist()) takes 180 ms ± 5.11
As suggested the third approach is the fastest, but sadly they are all slower than the iterrows-approach while my intention was to get rid of iterrows to make it faster.
EDIT 2:
pd.DataFrame(df["js"].apply(json.loads).tolist()) takes 25.2 ms ± 512 µs so we have a winner I guess.