0

I have a nested data frame in JSON. I have no problem with taking a data frame that isn't nested and converting into pandas data frame.

What I am having issues is when there are multiple levels of the data frame and I need to write independent records for each of the json entries.

{
  'type': 'text1',
  'key': ['key1'],
  
},  
{
  'type': 'text2',
  'key': ['key1', 'key2'], 
}, 
 'type': 'text3',
 'key': 'key', 
}

I used the following code to write this into a data frame.

 df = pd.DataFrame.from_dict(json)

enter image description here

Unfortunately for each of the entries, I have to include a record. So if key has 2 elements in the array, 2 entries will need to be created. And an additional column (key index) will be created. So what I am trying to get is something similar to below.

enter image description here

Any help would be greatly appreciated on this as I have been stuck on this for a while!

1 Answer 1

1

Use explode:

json = [{'type': 'text1', 'key': ['key1']},
        {'type': 'text2', 'key': ['key1', 'key2']},
        {'type': 'text3', 'key': 'key'}]

df = pd.DataFrame(json).explode('key') \
       .assign(key_index=lambda x: x.groupby(level=0).cumcount())
print(df)

# Output
    type   key  key_index
0  text1  key1          0
1  text2  key1          0
1  text2  key2          1
2  text3   key          0

Update

How would you start the key index at 1 instead of 0 in the case of multiple entries. For text 1, the generated index is 0. For text 2, the generated index starts at 1 and so there is 1 and 2.


df = pd.DataFrame(json).explode('key') \
       .assign(key_index=lambda x: x.groupby(level=0)['key'].transform(cumcount))
print(df)

# Output
    type   key  key_index
0  text1  key1          0
1  text2  key1          1
1  text2  key2          2
2  text3   key          0
Sign up to request clarification or add additional context in comments.

2 Comments

That worked. How would you start the key index at 1 instead of 0 in the case of multiple entries. For text 1, the generated index is 0. For text 2, the generated index starts at 1 and so there is 1 and 2. Thanks!
@KellyRoper. I updated my answer, can you check it please?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.