0

I have researched this problem and found out that Pandas' explode function does not work on multiple columns, however, I have seen a few questions submitted on StackOverflow however, none of them seem to work for me.

Dataset:

j = { 
    "_id" : "5c45", 
    "user" : 5, 
    "ids" : [
        "1019", 
        "1021", 
        "1162"
    ], 
    "roles" : ["2d7f"]
}

Current Script:

root = json_normalize(j)
x = (root.applymap(type) == list).all()
y = x.index[x].tolist()
root = root.apply(lambda x: [str(v).split(',') for v in x]).apply(pd.Series.explode)

print(root)

I tried this solution here, but I get a value error:

ValueError: cannot reindex from a duplicate axis

Expected Result:

_id,user,ids,roles
5c45,5,1019,2d7f
5c45,5,1021,2d7f
5c45,5,1162,2d7f

Is there a simple, yet effective workaround to this?

1 Answer 1

2

Try record_path and meta options:

pd.json_normalize(j, record_path=['ids'], meta=['_id','user','roles'])

Output:

      0   _id user roles
0  1019  5c45    5  2d7f
1  1021  5c45    5  2d7f
2  1162  5c45    5  2d7f

For a somewhat dynamic solution, try flatten the singletons:

pd.DataFrame({k:v[0] if isinstance(v, list) and len(v) == 1 else v
              for k,v in j.items()
             })
Sign up to request clarification or add additional context in comments.

1 Comment

is there a dynamic version of this, without explicitly specifying the column names, as it may differ from each json object?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.