Parsing nest JSON to Pandas Dataframe

Question

I'm trying to convert a JSON file structured like this:

oneD = {'Record':
            {'first': ['A', 'B', 
                    {'inter': ['1', '2', '3']}],
                     'second': ['C', 'D', 'E']},
        'Record2':
            {'first': ['G', 'H',
                    {'inter': ['5', '6']}],
                     'second': ['I', 'J', 'K']
            }}

And I would ultimately like to end up with a pandas dataframe like this:

import pandas as pd

data = {'Title': ['Record', 'Record', 'Record', 'Record2', 'Record2', 'Record2'],
    'First': ['A',      'B',      'NA',     'G',       'H',    'NA'],
    'Inter': ['1',      '2',       '3',     '5',       '6',   'NA'],
    'Second':['C',      'D',       'E',     'I',       'J',   'K']
    }

 df=pd.DataFrame(data)

    Title First Inter Second
0   Record     A     1      C
1   Record     B     2      D
2   Record    NA     3      E
3  Record2     G     5      I
4  Record2     H     6      J
5  Record2    NA    NA      K

I tried the following as suggested here (pandas dataframe from nested JSON with lists):

df = pd.DataFrame.from_dict(oneD, orient="index")
df2 = pd.concat([pd.DataFrame(df[i].values.tolist(), 
                          columns=[f"{i}_{num}" for num in range(len(df[i].iat[0]))]
                          ) for i in df.columns],axis=1)

Unfortunately, this produces:

 first_0 first_1 first_2                   second_0 second_1 second_2 second_3
 0       A       B   {'inter': ['1', '2', '3']}   C        D        E        F
 1       G       H   {'inter': ['5', '6']}        I        J        K     None

Can anyone offer any suggestions? I'm losing my mind in a series of nested loops trying to convert the full JSON file.

Mayank Porwal · Accepted Answer · 2022-01-28 19:56:15Z

0

Use df.T, df.apply, pd.Series.explode and pd.concat:

# Create a df from oneD and transpose it.
In [205]: x = pd.DataFrame(oneD).T.apply(pd.Series.explode).rename_axis('Title').reset_index()

# Convert rows with dict in 'first' to a new column 'inter'
In [210]: x[['first', 'inter']] = x['first'].apply(pd.Series)

# Group on 'Title' and explode the 'inter' column and create a new df
In [212]: inter = x.groupby('Title')['inter'].first().explode().reset_index()

# Concat the two df's 'x' and 'inter' to get the desired result
In [216]: res = pd.concat([x[['Title', 'first', 'second']], inter['inter']], 1).fillna('NA')

In [217]: res
Out[217]: 
     Title first second inter
0   Record     A      C     1
1   Record     B      D     2
2   Record    NA      E     3
3  Record2     G      I     5
4  Record2     H      J     6
5  Record2    NA      K    NA

edited Jan 28, 2022 at 19:56

answered Jan 28, 2022 at 19:51

Mayank Porwal

34.2k9 gold badges45 silver badges65 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Matt Over a year ago

I tried to run this as written and got: ValueError: cannot reindex from a duplicate axis

Mayank Porwal Over a year ago

Can you tell me on which command you got this error? I pasted the exact working solution.

Collectives™ on Stack Overflow

Parsing nest JSON to Pandas Dataframe

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related