Flatten nested JSON and concatenate to dataframe using pandas

Question

I have searched for a lot of similar topics online, but I have not found the solution yet.

My pandas dataframe looks like this:

index    FOR
0        [{'id': '2766', 'name': '0803 Computer Softwar...
1        [{'id': '2766', 'name': '0803 Computer Softwar...
2        [{'id': '2766', 'name': '0803 Computer Softwar...
3        [{'id': '2766', 'name': '0803 Computer Softwar...
4        [{'id': '2766', 'name': '0803 Computer Softwar...

And I would like to flatten all 4 rows to become like the following dataframe while below is just the result for the first row:

index   id      name
0       2766    0803 Computer Software

I found a similar solution here. Unfortunately, I got a "TypeError" as the following: TypeError: the JSON object must be str, bytes or bytearray, not 'list'

My code was:

dfs = []
for i in test['FOR']:
    data = json.loads(i)
    dfx = pd.json_normalize(data)
    dfs.append(dfx)   

df = pd.concat(dfs).reset_index(inplace = True)
print(df)

Would anyone can help me here? Thank you very much!

Umar.H · Accepted Answer · 2020-08-04 18:06:43Z

2

try using literal_eval from the ast standard lib.

from ast import literal_eval


df_flattened = pd.json_normalize(df['FOR'].map(literal_eval))

then drop duplicates.

print(df_flattened.drop_duplicates())

     id                    name
0  2766  0803 Computer Software

answered Aug 4, 2020 at 18:06

Umar.H

23.1k7 gold badges50 silver badges94 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Chen Over a year ago

Thanks for the help. Unfortunately, I got the error below: ValueError: malformed node or string: [{'id': '2766', 'name': '0803 Computer Software'}]

Umar.H Over a year ago

@Chen update your sample to mimic your question - I don't see a list of json objects.

Chen Over a year ago

I am so sorry I copied the table from Colab so it might lost something. I am new to CoLab. But if I printed: type(test['FOR'][0]) it did show a list. I apologize for the confusion.

Umar.H Over a year ago

@Chen whats in each list, just a single json object or many?

Chen Over a year ago

If I understand you correctly, there are two: [{"id": "2766", "name": "0803 Computer Software"}]

|

Chen · Accepted Answer · 2020-08-25 19:40:37Z

0

After a few weeks not touching related works, I encountered another similar case and I think I have got the solution so far for this case. Please feel free to correct me or provide any other ideas. I really appreciated all the helps and all the generous support!

chuck = []

for i in range(len(test)):
    chuck.append(json_normalize(test.iloc[i,:]['FOR']))

test_df = pd.concat(chuck)

And then drop duplicated columns for the test_df

answered Aug 25, 2020 at 19:40

Chen

3834 silver badges13 bronze badges

Collectives™ on Stack Overflow

Flatten nested JSON and concatenate to dataframe using pandas

2 Answers 2

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related