0

I have a Dataframe with columns that look like this:

df=pd.DataFrame()
df['symbol'] = ['A','B','C']
df['json_list'] = ['[{name:S&P500, perc:25, ticker:SPY, weight:1}]',
          '[{name:S&P500, perc:25, ticker:SPY, weight:0.5}, {name:NASDAQ, perc:26, ticker:NASDAQ, weight:0.5}]',
          '[{name:S&P500, perc:25, ticker:SPY, weight:1}]']
df['date'] = ['2022-01-01', '2022-01-02', '2022-01-02']
df:
    symbol  json_list                                         date
0   A       [{name:S&P500, perc:25, ticker:SPY, weight:1}]    2022-01-01
1   B       [{name:S&P500, perc:25, ticker:SPY, weight:0.5... 2022-01-02
2   C       [{name:S&P500, perc:25, ticker:SPY, weight:1}]    2022-01-02

The values in the json_list column are of <class 'str'>.

How can I convert the json_list column items to dicts so I can access them based on key:value pairs?

Thank you in advance.

6
  • 1
    This is very similar to stackoverflow.com/questions/20680272/… e.g., apply json.loads. Commented May 20, 2022 at 0:06
  • 2
    Except those aren't JSON strings. You could do some string parsing and then use ast.literal_eval, but it'd be quite ugly. Commented May 20, 2022 at 0:07
  • @BrokenBenchmark, you're right, these aren't jsons. I got this error when I applied json.loads: JSONDecodeError: Expecting property name enclosed in double quotes: Commented May 20, 2022 at 0:12
  • name should be 'name' if json Commented May 20, 2022 at 0:38
  • @BENY Please note these are string items that I want to convert to dicts. I don't know if that is possible. Commented May 20, 2022 at 0:41

2 Answers 2

1

UPDATED to reflect the fact that the json strings in the question are not singleton lists, but can contain multiple dict-like elements.

This will put a list of dict object in a new column of your dataframe:

def foo(x):
    src = x['json_list']
    rawList = src[1:-1].split('{')[1:]
    rawDictList = [x.split('}')[0] for x in rawList]
    dictList = [dict(x.strip().split(':') for x in y.split(',')) for y in rawDictList]
    for dct in dictList:
        for k in dct:
            try:
                dct[k] = int(dct[k])
            except ValueError:
                try:
                    dct[k] = float(dct[k])
                except ValueError:
                    pass
    return dictList
df['list_of_dict_object'] = df.apply(foo, axis = 1)

Original answer:

This will put a dict in a new column of your dataframe that should give you something close to what you want, except for numeric typing:

df['dict_object'] = df.apply(lambda x: dict(x.strip().split(':') for x in x['json_list'][2:-2].split(',')), axis = 1)

To get float or int where string values are convertible, you can do this:

def foo(x):
    d = dict(x.strip().split(':') for x in x['json_list'][2:-2].split(','))
    for k in d:
        try:
            d[k] = int(d[k])
        except ValueError:
            try:
                d[k] = float(d[k])
            except ValueError:
                pass
    return d
df['dict_object'] = df.apply(foo, axis = 1)
Sign up to request clarification or add additional context in comments.

Comments

0

The "json" is almost valid yaml. If you add a space after the colons, you can parse it using pyyaml.

df.json_list.apply(lambda data: yaml.safe_load(data.replace(':', ': ')))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.