I have the following DataFrame that I get "as-is" from an API:
df = pd.DataFrame({'keys': {0: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]",
1: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '585'}, {'strip': '10/1/2022'}]",
2: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '580'}, {'strip': '10/1/2022'}]",
3: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '545'}, {'strip': '10/1/2022'}]",
4: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '555'}, {'strip': '10/1/2022'}]"},
'value': {0: 353.3, 1: 25.8, 2: 336.65, 3: 366.05, 4: 20.8}})
>>> df
keys value
0 [{'contract': 'G'}, {'contract_type': 'C'}, {'... 353.30
1 [{'contract': 'G'}, {'contract_type': 'P'}, {'... 25.80
2 [{'contract': 'G'}, {'contract_type': 'C'}, {'... 336.65
3 [{'contract': 'G'}, {'contract_type': 'C'}, {'... 366.05
4 [{'contract': 'G'}, {'contract_type': 'P'}, {'... 20.80
Each row of the "keys" column is a string (not JSON, as the values are enclosed in single quotes instead of double quotes). For example:
>>> df.at[0, keys]
"[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]"
I would like to convert the "keys" column to a DataFrame and append it to df as new columns.
I am currently doing:
- Replacing single quotes with double quotes and passing to
json.loadsto read into a list of dictionaries with the below structure:
[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]
- Combining the dictionaries into a single dictionary with dictionary comprehension:
{'contract': 'G', 'contract_type': 'C', 'strike': '560', 'strip': '10/1/2022'}
apply-ing this to every row and calling thepd.DataFrameconstructor on the result.joinback to originaldf
In a single line, my code is:
>>> df.drop("keys", axis=1).join(pd.DataFrame(df["keys"].apply(lambda x: {k: v for d in json.loads(x.replace("'","\"")) for k, v in d.items()}).tolist()))
value contract contract_type strike strip
0 353.30 G C 560 10/1/2022
1 25.80 G P 585 10/1/2022
2 336.65 G C 580 10/1/2022
3 366.05 G C 545 10/1/2022
4 20.80 G P 555 10/1/2022
I was wondering if there is a better way to do this.