Flatten nested JSON into pandas dataframe columns

Question

I have a pandas column with nested json data string. I'd like to flatten the data into multiple pandas columns.

Here's data from a single cell:

rent['ques'][9] = "{'Rent': [{'Name': 'Asking', 'Value': 16.07, 'Unit': 'Usd'}], 'Vacancy': {'Name': 'Vacancy', 'Value': 25.34100001, 'Unit': 'Pct'}}"

For each cell in pandas column, I'd like parse this string and create multiple columns. Expected output looks something like this:

When I run, json_normalize(rent['ques']), I receive the following error.

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-cebc86357f34> in <module>()
----> 1 json_normalize(rentoff['Survey'])

/anaconda3/lib/python3.7/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
    196     if record_path is None:
    197         if any([[isinstance(x, dict)
--> 198                 for x in compat.itervalues(y)] for y in data]):
    199             # naive normalization, this is idempotent for flat records
    200             # and potentially will inflate the data considerably for

/anaconda3/lib/python3.7/site-packages/pandas/io/json/normalize.py in <listcomp>(.0)
    196     if record_path is None:
    197         if any([[isinstance(x, dict)
--> 198                 for x in compat.itervalues(y)] for y in data]):
    199             # naive normalization, this is idempotent for flat records
    200             # and potentially will inflate the data considerably for

/anaconda3/lib/python3.7/site-packages/pandas/compat/__init__.py in itervalues(obj, **kw)
    210 
    211     def itervalues(obj, **kw):
--> 212         return iter(obj.values(**kw))
    213 
    214     next = next

AttributeError: 'str' object has no attribute 'values'

that dict is in string representation. you have to convert it to dict first using e.g. json.loads — gbruenjes
– gbruenjes, Commented Jun 8, 2020 at 12:50

NYC Coder · Accepted Answer · 2020-06-08 13:04:03Z

0

Try this:

df['quest'] = df['quest'].str.replace("'", '"')
dfs = []
for i in df['quest']:
    data = json.loads(i)
    dfx = pd.json_normalize(data, record_path=['Rent'], meta=[['Vacancy', 'Name'], ['Vacancy', 'Unit'], ['Vacancy', 'Value']])
    dfs.append(dfx)   

df = pd.concat(dfs).reset_index(drop=['index'])
print(df)


     Name  Value Unit Vacancy.Name Vacancy.Unit Vacancy.Value
0  Asking  16.07  Usd      Vacancy          Pct        25.341
1  Asking  16.07  Usd      Vacancy          Pct        25.341
2  Asking  16.07  Usd      Vacancy          Pct        25.341

answered Jun 8, 2020 at 13:04

NYC Coder

7,6443 gold badges14 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Flatten nested JSON into pandas dataframe columns

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related