1

I'm pretty new to Python (just migrating from R) and would like to convert a list to a pandas DataFrame. After researching the topic I found a lot of answers but none of which led to the desired result.

The data originates from an API and has the following structure:

[
    {
        "id": "ID_ONE",
        "name": "NAME_ONE",
        "source": {
            "id": "AB",
            "value": "source AB"
        },
        "topics": [
            {
                "id": "11",
                "value": "topic 11 "
            },
            {
                "id": "12",
                "value": "topic 12 "
            }
        ]
    },
    {
        "id": "ID_TWO",
        "name": "NAME_TWO",
        "source": {
            "id": "BC",
            "value": "source BC"
        },
        "topics": [
            {
                "id": "12",
                "value": "topic 12 "
            }
        ]
    }
]

After using requests and json_normalize, I end up with a nice DataFrame, but 'topics' (being a list of dictionaries) stays a Series of lists.

Do you have any suggestions how to handle this list?

I would also appreciate any comments or advice whether you think that other data structures are more useful to handle such an output in Python (coming from R, I just feel comfortable using DataFrames and lists).

2
  • this might help medium.com/@amirziai/… Commented Apr 5, 2017 at 19:50
  • @EzerK, Perfect, thanks! That's exactly was I was looking for! Commented Apr 6, 2017 at 13:45

1 Answer 1

1

I'll assume you got that far

import pandas as pd
from pandas.io.json import json_normalize
df=json_normalize(CopyPastedFromQuestion)

You can serialise df.topics again in a loop. However, you need to code how your result should look like. A possible solution could be

all_topics=pd.DataFrame()
for i,row in df.iterrows():
    try:
        topics=json_normalize(df['topics'].values[i])
        topics['parent_id']=row['id']
        all_topics=all_topics.append(topics)
    except:
        pass
final=pd.merge(df,all_topics, left_on='id', right_on='parent_id', how='left')
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the example! It works well as long as the list 'topics' is not empty. Do you have an idea of how to catch this error?
I've edited the example; it will be a guess as I don't have a data sample. Added a try.. except block, and changed the join to "left"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.