1

I have a pandas dataframe with multiple headers. I am wondering how to convert this into a list of nested directories. Each row in the Pandas dataframe will be a nested dictionary in the list.

Here is an example

#Creaet an example multiheader dataframe

col =['id','x, single room','x, double room','y, single room','y, double room' ]
df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col)
a = df.columns.str.split(', ', expand=True).values
#swap values in NaN and replace NAN to ''
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a])
df

result

        x   y
id  single room double room single room double room
0   1   2   3   4   5
1   3   4   7   5   3

This is the dataframe I want to convert to a list of nested dictionaries. So this is the desired result

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]

In the code below, I directly create this list.

firstDict = { 'id':1, 'x':{'single room':2, 'double room':3}, 'y':{'single room':4, 'double room':5} }
secondDict = { 'id':3, 'x':{'single room':4, 'double room':7}, 'y':{'single room':5, 'double room':3} }
dictList = []
dictList.append( firstDict )
dictList.append( secondDict )
dictList

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]

So in summary, how to I convert the dataframe df to what dictList is.

Edit:

This is a minimal example, the solution I am looking for should generalize to longer numbers of headers.

5 Answers 5

6

I don't think there is a straight forward way of doing this, that being said, you could use stack + to_dict and some post-processing afterwards:

# prepare the DataFrame
df = df.set_index(('', 'id')).stack(level=0)
df.index.names = ['id', None]

# convert to a dicts of dicts
d = {}
for (idi, key), values in df.to_dict('index').items():
    d.setdefault(idi, {}).update({key: values})

# convert d to list of dicts
result = [{'id': k, **values} for k, values in d.items()]

Output

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]
Sign up to request clarification or add additional context in comments.

2 Comments

I am wondering how to apply df = df.set_index(('', 'id')).stack(level=0) in cases where the column to set index also has multiple headers. In my case, the top level column name is 'ID', and 2nd level column name is 'New'. I tried df2 = l[51].set_index(('ID', 'New')).stack(level=0) but got KeyError: ('ID', 'New')
That must be because there is no column named liked that, try to take a look at your column name
2

Unsure of how long your number of headers can be, currently it is in a state of easily being coded by hand, as given below -

dct = []

for x in df.values:
  nd = {
          "id": x[0],
          "x": {
                  "single room": x[1],
                  "double room": x[2]
               },
          "y": {
                  "single room": x[3],
                  "double room": x[4]
               }
       }
  dct.append(nd)

Do let me know if there are a high number of headers and the code needs to handle them without explicit typing in.

Comments

1

Something like this?

import pandas as pd
col =['id','x, single room','x, double room','y, single room','y, double room' ]
df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col)
a = df.columns.str.split(', ', expand=True).values
#swap values in NaN and replace NAN to ''
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a])
print(df)

dict_list = []
for index, row in df.iterrows():
    d = {}
#    _dict [row["id"]]
    print(type(row), row)#, row.select(1, axis = 0) )
    d["id"] = row[0]
    d["x"] = {'single room':row[1], 'double room':row[1]}
    d["y"] = {'single room':row[3], 'double room':row[4]}
    dict_list.append(d)

print(dict_list)

OUTPUT:

[{'id': 1, 
  'x': {'single room': 2, 'double room': 2}, 
  'y': {'single room': 4, 'double room': 5}
  }, 
{'id': 3, 
  'x': {'single room': 4, 'double room': 4}, 
  'y': {'single room': 5, 'double room': 3}
}
]

Comments

1

you could use either

l = []
d = None
for i, row in df.iterrows():
    for (i1,i2),v in row.iteritems():
        if i2 == 'id':
            d = {i2:v}
            l.append(d)
            continue
        try:
            d[i1][i2]=v
        except KeyError:
            d[i1] = {i2:v}

or if you're ok with a slight modification of your expected result:

from collections import defaultdict
l =[]
for i, row in df.iterrows():
    d = defaultdict(dict)
    for (i1,i2),v in row.iteritems():
        if i2 == 'id':
            d[i2][v]=v
        else:
            d[i1][i2]=v
    l.append(dict(d))

which outputs:

[{'id': {1: 1},
  'x': {'single room': 2, 'double room': 3},
  'y': {'single room': 4, 'double room': 5}},
 {'id': {3: 3},
  'x': {'single room': 4, 'double room': 7},
  'y': {'single room': 5, 'double room': 3}}]

Comments

1

I like accepted solution, but here my two alternatives wihout stacking.

This solution is straight-forward but with more columns a lot of repetition and error-prone:

lst = [{'id': d[('', 'id')], 
        'x': {'single room': d[('x', 'single room')], 'double room': d[('x', 'double room')]},
        'y': {'single room': d[('y', 'single room')], 'double room': d[('y', 'double room')]},}
        for d in df.to_dict('records')
]

Let's try make it more scalable, from Arbitrarily nested dictionary from tuples you can get nest function:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

But for ('', id) we need slighlty different behavior:

def nest_m(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        if key == ('', 'id'):
            result['id'] = value
        else:
            target = result
            for k in key[:-1]:  
                target = target.setdefault(k, {})
            target[key[-1]] = value    
    return result

Final line:

lst = [nest_m(d) for d in df.to_dict('records')]  

Output:

[{'id': 1,
  'x': {'single room': 2, 'double room': 3},
  'y': {'single room': 4, 'double room': 5}},
 {'id': 3,
  'x': {'single room': 4, 'double room': 7},
  'y': {'single room': 5, 'double room': 3}}]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.