0

I'm trying to convert a JSON file structured like this:

oneD = {'Record':
            {'first': ['A', 'B', 
                    {'inter': ['1', '2', '3']}],
                     'second': ['C', 'D', 'E']},
        'Record2':
            {'first': ['G', 'H',
                    {'inter': ['5', '6']}],
                     'second': ['I', 'J', 'K']
            }}

And I would ultimately like to end up with a pandas dataframe like this:

import pandas as pd

data = {'Title': ['Record', 'Record', 'Record', 'Record2', 'Record2', 'Record2'],
    'First': ['A',      'B',      'NA',     'G',       'H',    'NA'],
    'Inter': ['1',      '2',       '3',     '5',       '6',   'NA'],
    'Second':['C',      'D',       'E',     'I',       'J',   'K']
    }

 df=pd.DataFrame(data)
    Title First Inter Second
0   Record     A     1      C
1   Record     B     2      D
2   Record    NA     3      E
3  Record2     G     5      I
4  Record2     H     6      J
5  Record2    NA    NA      K

I tried the following as suggested here (pandas dataframe from nested JSON with lists):

df = pd.DataFrame.from_dict(oneD, orient="index")
df2 = pd.concat([pd.DataFrame(df[i].values.tolist(), 
                          columns=[f"{i}_{num}" for num in range(len(df[i].iat[0]))]
                          ) for i in df.columns],axis=1)

Unfortunately, this produces:

 first_0 first_1 first_2                   second_0 second_1 second_2 second_3
 0       A       B   {'inter': ['1', '2', '3']}   C        D        E        F
 1       G       H   {'inter': ['5', '6']}        I        J        K     None

Can anyone offer any suggestions? I'm losing my mind in a series of nested loops trying to convert the full JSON file.

1 Answer 1

0

Use df.T, df.apply, pd.Series.explode and pd.concat:

# Create a df from oneD and transpose it.
In [205]: x = pd.DataFrame(oneD).T.apply(pd.Series.explode).rename_axis('Title').reset_index()

# Convert rows with dict in 'first' to a new column 'inter'
In [210]: x[['first', 'inter']] = x['first'].apply(pd.Series)

# Group on 'Title' and explode the 'inter' column and create a new df
In [212]: inter = x.groupby('Title')['inter'].first().explode().reset_index()

# Concat the two df's 'x' and 'inter' to get the desired result
In [216]: res = pd.concat([x[['Title', 'first', 'second']], inter['inter']], 1).fillna('NA')

In [217]: res
Out[217]: 
     Title first second inter
0   Record     A      C     1
1   Record     B      D     2
2   Record    NA      E     3
3  Record2     G      I     5
4  Record2     H      J     6
5  Record2    NA      K    NA
Sign up to request clarification or add additional context in comments.

2 Comments

I tried to run this as written and got: ValueError: cannot reindex from a duplicate axis
Can you tell me on which command you got this error? I pasted the exact working solution.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.