0

I have a pandas column with nested json data string. I'd like to flatten the data into multiple pandas columns.

Here's data from a single cell:

rent['ques'][9] = "{'Rent': [{'Name': 'Asking', 'Value': 16.07, 'Unit': 'Usd'}], 'Vacancy': {'Name': 'Vacancy', 'Value': 25.34100001, 'Unit': 'Pct'}}"

For each cell in pandas column, I'd like parse this string and create multiple columns. Expected output looks something like this:

rent_vacancy

When I run, json_normalize(rent['ques']), I receive the following error.

 ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-cebc86357f34> in <module>()
----> 1 json_normalize(rentoff['Survey'])

/anaconda3/lib/python3.7/site-packages/pandas/io/json/normalize.py in json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep)
    196     if record_path is None:
    197         if any([[isinstance(x, dict)
--> 198                 for x in compat.itervalues(y)] for y in data]):
    199             # naive normalization, this is idempotent for flat records
    200             # and potentially will inflate the data considerably for

/anaconda3/lib/python3.7/site-packages/pandas/io/json/normalize.py in <listcomp>(.0)
    196     if record_path is None:
    197         if any([[isinstance(x, dict)
--> 198                 for x in compat.itervalues(y)] for y in data]):
    199             # naive normalization, this is idempotent for flat records
    200             # and potentially will inflate the data considerably for

/anaconda3/lib/python3.7/site-packages/pandas/compat/__init__.py in itervalues(obj, **kw)
    210 
    211     def itervalues(obj, **kw):
--> 212         return iter(obj.values(**kw))
    213 
    214     next = next

AttributeError: 'str' object has no attribute 'values'
1
  • that dict is in string representation. you have to convert it to dict first using e.g. json.loads Commented Jun 8, 2020 at 12:50

1 Answer 1

0

Try this:

df['quest'] = df['quest'].str.replace("'", '"')
dfs = []
for i in df['quest']:
    data = json.loads(i)
    dfx = pd.json_normalize(data, record_path=['Rent'], meta=[['Vacancy', 'Name'], ['Vacancy', 'Unit'], ['Vacancy', 'Value']])
    dfs.append(dfx)   

df = pd.concat(dfs).reset_index(drop=['index'])
print(df)


     Name  Value Unit Vacancy.Name Vacancy.Unit Vacancy.Value
0  Asking  16.07  Usd      Vacancy          Pct        25.341
1  Asking  16.07  Usd      Vacancy          Pct        25.341
2  Asking  16.07  Usd      Vacancy          Pct        25.341
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.