2

How to do untangle nested dictionary with list in key/value into columns? I tried different combination to solve converting nested dictionary into pandas data frame. Looked through the stack I am getting close to fix the problem just not quite.

Sample Data:

test = {
    'abc': {
        'company_id': '123c',
        'names': ['Oscar', 'John Smith', 'Smith, John'],
        'education': ['MS', 'BS']
    },
    'DEF': {
        'company_id': '124b',
        'names': ['Matt B.'],
        'education': ['']
    }
}

Tried:

1)

pd.DataFrame(list(test.items())) # not working entirely - creates {dictionary in col '1'}

2)

df = pd.concat({
        k: pd.DataFrame.from_dict(v, 'index') for k, v in test.items()
    }, 
    axis=0)

df2 = df.T
df2.reset_index() # creates multiple columns

Output Needed:

enter image description here

2 Answers 2

2

Update:

With the release of pandas 0.25 and the addition of explode this is now a lot easier:

frame = pd.DataFrame(test).T
frame = frame.explode('names').set_index(
    ['company_id', 'names'],
    append=True).explode(
    'education').reset_index(
    ['company_id', 'names']
)

Pre pandas 0.25:

This is not really lean but then this is a rather complicated transformation. Inspired by this blog post, I solved it using two separate iterations of turning the list column into a series and then transforming the DataFrame using melt.

import pandas as pd

test = {
    'abc': {
        'company_id': '123c',
        'names': ['Oscar', 'John Smith', 'Smith, John'],
        'education': ['MS', 'BS']
    },
    'DEF': {
        'company_id': '124b',
        'names': ['Matt B.'],
        'education': ['']
    }
}

frame = pd.DataFrame(test).T

names = frame.names.apply(pd.Series)
frame = frame.merge(
    names, left_index=True, right_index=True).drop('names', axis=1)
frame = frame.reset_index().melt(
    id_vars=['index', 'company_id', 'education'],
    value_name='names').drop('variable', axis=1).dropna()

education = frame.education.apply(pd.Series)
frame = frame.merge(
    education, left_index=True, right_index=True).drop('education', axis=1)
frame = frame.melt(
    id_vars=['index', 'company_id', 'names'],
    value_name='education').drop(
    'variable', axis=1).dropna().sort_values(by=['company_id', 'names'])

frame.columns = ['set_name', 'company_id', 'names', 'education']

print(frame)

Result:

  set_name company_id        names education
2      abc       123c   John Smith        MS
6      abc       123c   John Smith        BS
0      abc       123c        Oscar        MS
4      abc       123c        Oscar        BS
3      abc       123c  Smith, John        MS
7      abc       123c  Smith, John        BS
1      DEF       124b      Matt B.          
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a bunch! I didn't knew it would this complicated. I thought either pandas/json dict library should be able to untaggle this kind of nested structured.
I've learned since yesterday that there is a json_normalize function that may be able to do this in a one-liner. I'll try and update my post if it works.
Oh yes tried json_normalize from pandas it didn't quite work. May be I used it differently.
@sharp I have edited my answer with a much better approach that has become available with pandas 0.25.
Thanks! This one so much shorter. The way I tried clearly didn't work, I didn't use explode.
0
pd.DataFrame.from_dict(test, orient='index').explode(['names']).explode(['education']).reset_index().rename(columns={'index':'set_name'})


 set_name company_id        names education
0      abc       123c        Oscar        MS
1      abc       123c        Oscar        BS
2      abc       123c   John Smith        MS
3      abc       123c   John Smith        BS
4      abc       123c  Smith, John        MS
5      abc       123c  Smith, John        BS
6      DEF       124b      Matt B.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.