1

I'm looking for a solution to build a nested dict / JSON with the last three columns "name", "color", "amount" as attributes inside a "products" list. The values from the cat1-cat3 columns should be the keys.

The provided DataFrame looks like this:

import pandas as pd

df = pd.DataFrame({
    'cat1': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'cat2': ['BB', 'BB', 'BC', 'BB', 'BB', 'BB', 'BC', 'BC'],
    'cat3': ['CC', 'CC', 'CD', 'CD', 'CD', 'CC', 'CD', 'CE'],
    'name': ['P1', 'P2', 'P3', 'P1', 'P4', 'P1', 'P3','P6'],
    'color': ['red', 'blue', 'green', 'green', 'yellow', 'red', 'blue', 'blue']
    'amount': [132, 51, 12, 421, 55, 11, 123, 312]
})

This would be the desired output:

{
   "A":{
      "BB":{
         "CC":{
            "products":[
               {
                  "name":"P1",
                  "color":"red",
                  "amount":132
               },
               {
                  "name":"P2",
                  "color":"blue",
                  "amount":51
               }
            ]
         }
      },
      "BC":{
         "CD":{
            "products":[
               {
                  "name":"P3",
                  "color":"green",
                  "amount":12
               }
            ]
         }
      }
   },
   "B":{
      "BB":{
         "CD":{
            "products":[
               {
                  "name":"P1",
                  "color":"green",
                  "amount":421
               },
               {
                  "name":"P4",
                  "color":"yellow",
                  "amount":55
               }
            ]
         }
      }
   },
   "C":{
      "BB":{
         "CC":{
            "products":[
               {
                  "name":"P1",
                  "color":"red",
                  "amount":11
               }
            ]
         }
      },
      "BC":{
         "CD":{
            "products":[
               {
                  "name":"P3",
                  "color":"blue",
                  "amount":123
               }
            ]
         },
         "CE":{
            "products":[
               {
                  "name":"P6",
                  "color":"blue",
                  "amount":312
               }
            ]
         }
      }
   }
}

@BEN_YO provided a recursive solution for this problem without the inner products part.

So I'm actually looking for an adaption of this method with an inner list:

def recur_dictify(frame):
     if len(frame.columns) == 1:
         if frame.values.size == 1: return frame.values[0][0]
         return frame.values.squeeze()
     grouped = frame.groupby(frame.columns[0])
     d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
     return d
     
recur_dictify(df)

3 Answers 3

2

If another way is fine , you can try the below, it is a little dirty though (you can try optimizing it)

cols = ['name','color','amount']
u = df[df.columns.difference(cols)].join(df[cols].agg(dict,1).rename('d'))
v = (u.groupby(['cat1','cat2','cat3'])['d'].agg(list).reset_index("cat3"))

v = v.groupby(v.index).apply(lambda x: dict(zip(x['cat3'],x['d'])))
v.index = pd.MultiIndex.from_tuples(v.index,names=['cat1','cat2'])
d = v.unstack(0).to_dict()

print(d)
{'A': {'BB': {'CC': [{'amount': 132, 'color': 'red', 'name': 'P1'},
                     {'amount': 51, 'color': 'blue', 'name': 'P2'}]},
       'BC': {'CD': [{'amount': 12, 'color': 'green', 'name': 'P3'}]}},
 'B': {'BB': {'CD': [{'amount': 421, 'color': 'green', 'name': 'P1'},
                     {'amount': 55, 'color': 'yellow', 'name': 'P4'}]},
       'BC': nan},
 'C': {'BB': {'CC': [{'amount': 11, 'color': 'red', 'name': 'P1'}]},
       'BC': {'CD': [{'amount': 123, 'color': 'blue', 'name': 'P3'}],
              'CE': [{'amount': 312, 'color': 'blue', 'name': 'P6'}]}}}
Sign up to request clarification or add additional context in comments.

Comments

2

We can groupby on cat1, cat2 and cat3 and recursively build the dictionary based on the grouped categories:

def set_val(d, k, v):
    if len(k) == 1:
        d[k[0]] = v
    else:
        d[k[0]] = set_val(d.get(k[0], {}), k[1:], v)
    return d


dct = {}
for k, g in df.groupby(['cat1', 'cat2', 'cat3']):
    set_val(dct, k, {'products': g[['name', 'color', 'amount']].to_dict('r')})

print(dct)

{'A': {'BB': {'CC': {'products': [{'amount': 132, 'color': 'red', 'name': 'P1'},
                                  {'amount': 51, 'color': 'blue', 'name': 'P2'}]}},
       'BC': {'CD': {'products': [{'amount': 12, 'color': 'green', 'name': 'P3'}]}}},
 'B': {'BB': {'CD': {'products': [{'amount': 421, 'color': 'green', 'name': 'P1'},
                                  {'amount': 55, 'color': 'yellow', 'name': 'P4'}]}}},
 'C': {'BB': {'CC': {'products': [{'amount': 11, 'color': 'red', 'name': 'P1'}]}},
       'BC': {'CD': {'products': [{'amount': 123, 'color': 'blue', 'name': 'P3'}]},
              'CE': {'products': [{'amount': 312, 'color': 'blue', 'name': 'P6'}]}}}}

Comments

1

This is a generic method adapted from Shubham Sharma's great Solution

def gen_nested_dict(dataframe, group, inner_key, inner_dict):
    def set_val(d, k2, v):
        if len(k2) == 1:
            d[k2[0]] = v
        else:
            d[k2[0]] = set_val(d.get(k2[0], {}), k2[1:], v)
        return d

    dct = {}
    for k, g in dataframe.groupby(group):
        set_val(dct, k, {inner_key: g[inner_dict].to_dict('records')})

    return dct

 mydct = gen_nested_dict(df, ['cat1', 'cat2', 'cat3'], 'products', ['name', 'color', 'amount'])

1 Comment

Nice abstraction :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.