Explode multiple columns in Pandas

Question

I have researched this problem and found out that Pandas' explode function does not work on multiple columns, however, I have seen a few questions submitted on StackOverflow however, none of them seem to work for me.

Dataset:

j = { 
    "_id" : "5c45", 
    "user" : 5, 
    "ids" : [
        "1019", 
        "1021", 
        "1162"
    ], 
    "roles" : ["2d7f"]
}

Current Script:

root = json_normalize(j)
x = (root.applymap(type) == list).all()
y = x.index[x].tolist()
root = root.apply(lambda x: [str(v).split(',') for v in x]).apply(pd.Series.explode)

print(root)

I tried this solution here, but I get a value error:

ValueError: cannot reindex from a duplicate axis

Expected Result:

_id,user,ids,roles
5c45,5,1019,2d7f
5c45,5,1021,2d7f
5c45,5,1162,2d7f

Is there a simple, yet effective workaround to this?

Quang Hoang · Accepted Answer · 2021-02-16 16:19:48Z

2

Try record_path and meta options:

pd.json_normalize(j, record_path=['ids'], meta=['_id','user','roles'])

Output:

      0   _id user roles
0  1019  5c45    5  2d7f
1  1021  5c45    5  2d7f
2  1162  5c45    5  2d7f

For a somewhat dynamic solution, try flatten the singletons:

pd.DataFrame({k:v[0] if isinstance(v, list) and len(v) == 1 else v
              for k,v in j.items()
             })

edited Feb 16, 2021 at 16:19

answered Feb 16, 2021 at 16:14

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jcoke Over a year ago

is there a dynamic version of this, without explicitly specifying the column names, as it may differ from each json object?

Collectives™ on Stack Overflow

Explode multiple columns in Pandas

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related