3

I have the following DataFrame that I get "as-is" from an API:

df = pd.DataFrame({'keys': {0: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]",
                            1: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '585'}, {'strip': '10/1/2022'}]",
                            2: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '580'}, {'strip': '10/1/2022'}]",
                            3: "[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '545'}, {'strip': '10/1/2022'}]",
                            4: "[{'contract': 'G'}, {'contract_type': 'P'}, {'strike': '555'}, {'strip': '10/1/2022'}]"},
                   'value': {0: 353.3, 1: 25.8, 2: 336.65, 3: 366.05, 4: 20.8}})

>>> df
                                                keys   value
0  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  353.30
1  [{'contract': 'G'}, {'contract_type': 'P'}, {'...   25.80
2  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  336.65
3  [{'contract': 'G'}, {'contract_type': 'C'}, {'...  366.05
4  [{'contract': 'G'}, {'contract_type': 'P'}, {'...   20.80

Each row of the "keys" column is a string (not JSON, as the values are enclosed in single quotes instead of double quotes). For example:

>>> df.at[0, keys]
"[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]"

I would like to convert the "keys" column to a DataFrame and append it to df as new columns.

I am currently doing:

  1. Replacing single quotes with double quotes and passing to json.loads to read into a list of dictionaries with the below structure:
[{'contract': 'G'}, {'contract_type': 'C'}, {'strike': '560'}, {'strip': '10/1/2022'}]
  1. Combining the dictionaries into a single dictionary with dictionary comprehension:
{'contract': 'G', 'contract_type': 'C', 'strike': '560', 'strip': '10/1/2022'}
  1. apply-ing this to every row and calling the pd.DataFrame constructor on the result.
  2. join back to original df

In a single line, my code is:

>>> df.drop("keys", axis=1).join(pd.DataFrame(df["keys"].apply(lambda x: {k: v for d in json.loads(x.replace("'","\"")) for k, v in d.items()}).tolist()))

    value contract contract_type strike      strip
0  353.30        G             C    560  10/1/2022
1   25.80        G             P    585  10/1/2022
2  336.65        G             C    580  10/1/2022
3  366.05        G             C    545  10/1/2022
4   20.80        G             P    555  10/1/2022

I was wondering if there is a better way to do this.

2
  • 1
    I was looking at this post and wonder "Man, this is well-written pandas question." Then I scrolled down... ;) Commented Mar 22, 2022 at 18:45
  • 1
    @richardec - I appreciate it :D Commented Mar 22, 2022 at 18:49

2 Answers 2

2

You can use ast.literal_eval and ChainMap collection to merge a list of dictionaries into a single dict.

from collections import ChainMap

df['keys'] = df['keys'].apply(ast.literal_eval).apply(lambda x: dict(ChainMap(*x)))

print(df)
                                               keys   value
0  {'strip': '10/1/2022', 'strike': '560', 'contr...  353.30
1  {'strip': '10/1/2022', 'strike': '585', 'contr...   25.80
2  {'strip': '10/1/2022', 'strike': '580', 'contr...  336.65
3  {'strip': '10/1/2022', 'strike': '545', 'contr...  366.05
4  {'strip': '10/1/2022', 'strike': '555', 'contr...   20.80

Then use .apply(pd.Series) to explode a column of dictionaries into separate columns and use concat to combine it with the rest of the dataframe

df_ = pd.concat([df['keys'].apply(pd.Series), df['value']], axis=1)

print(df_)
       strip strike contract_type contract   value
0  10/1/2022    560             C        G  353.30
1  10/1/2022    585             P        G   25.80
2  10/1/2022    580             C        G  336.65
3  10/1/2022    545             C        G  366.05
4  10/1/2022    555             P        G   20.80
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! This is already slightly faster than what I have on a much larger DataFrame than the example
@not_speshal The problem here is the use of 3 apply, doing the first 2 in a single apply and then replace the last one by pd.DataFrame would be faster like pd.DataFrame(df['keys'].apply(lambda x: dict(ChainMap(*eval(x)))).tolist()). I get 3 time faster than original idea of this answer. then of course need the concat.
DO NOT use eval. It's very unsafe. Use ast.literal_eval instead.
2

You could use ast.literal_eval (built-in) to convert the dict strings to actual dicts, and then use pd.json_normalize with record_path=[[]] to get the objects into a table format:

import ast
new_df = pd.json_normalize(df['keys'].apply(ast.literal_eval), record_path=[[]]).apply(lambda col: col.dropna().tolist())

Output:

>>> new_df
  contract contract_type strike      strip
0        G             C    560  10/1/2022
1        G             P    585  10/1/2022
2        G             C    580  10/1/2022
3        G             C    545  10/1/2022
4        G             P    555  10/1/2022

An alternate solution would be to use string replacement to merge the separate dicts into one:

import ast
new_df = pd.DataFrame(df['keys'].str.replace("'}, {'", "', '", regex=True).apply(ast.literal_eval).str[0].tolist())

Output:


Yet another option, this one using functools.reduce (built in):

import ast
new_df = pd.DataFrame(df['keys'].apply(ast.literal_eval).apply(lambda row: functools.reduce(lambda x, y: x | y, row)).tolist())

5 Comments

Definitely works for my example data but the dropna would be dangerous if one of the keys was missing for any row.
@not_speshal check the answer now. I added more solutions ;)
I like the functools solution. But again, this is just marginally faster, probably because of the double apply. Still a +1 from me. Thank you! :)
So @not_speshal are you looking for a faster solution or a cleaner one? If both, which do you prioritize?
Priority on performance every day ! But the solutions on this post are good enough.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.