5
      Groups sub-groups selections
    0   sg1    csg1       sc1
    1   sg1    csg1       sc2
    2   sg1    csg2       sc3
    3   sg1    csg2       sc4
    4   sg2    csg3       sc5
    5   sg2    csg3       sc6
    6   sg2    csg4       sc7
    7   sg2    csg4       sc8

I have the dataframe mentioned above and I am trying to create a JSON object as follows:

{
  "sg1": {
    "csg1": ['sc1', 'sc2'],
    "csg2": ['sc3', 'sc4']
  },
  "sg2": {
    "csg3": ['sc5', 'sc6'],
    "csg4": ['sc7', 'sc8']
  }
}

I tried using the pandas to_json and to_dict with orient arguments but I am not getting the expected result. I also tried grouping by the columns and then creating the list and converting it into a JSON.

Any help is much appreciated.

3 Answers 3

5

You can groupby ['Groups','sub-groups'] and build a dictionary from the multiindex series with a dictionary comprehension:

s = df.groupby(['Groups','sub-groups']).selections.agg(list)
d = {k1:{k2:v} for (k1,k2),v in s.iteritems()}

print(d)
# {'sg1': {'csg2': ['sc3', 'sc4']}, 'sg2': {'csg4': ['sc7', 'sc8']}}
Sign up to request clarification or add additional context in comments.

Comments

0

You need to group on the columns of interest such as:

import pandas as pd

data = {
        'Groups': ['sg1', 'sg1', 'sg1', 'sg1', 'sg2', 'sg2', 'sg2', 'sg2'],
        'sub-groups': ['csg1', 'csg1', 'csg2', 'csg2', 'csg3', 'csg3', 'csg4', 'csg4'],
        'selections': ['sc1', 'sc2', 'sc3', 'sc4', 'sc5', 'sc6', 'sc7', 'sc8']
}

df = pd.DataFrame(data)
print(df.groupby(['Groups', 'sub-groups'])['selections'].unique().to_dict())

The output is:

{
    ('sg1', 'csg1'): array(['sc1', 'sc2'], dtype=object), 
    ('sg1', 'csg2'): array(['sc3', 'sc4'], dtype=object), 
    ('sg2', 'csg3'): array(['sc5', 'sc6'], dtype=object), 
    ('sg2', 'csg4'): array(['sc7', 'sc8'], dtype=object)
}

Comments

0

Lets try dictify function which builds a nested dictionary with top level keys from the Groups and corresponding sub level keys from sub-groups:

from collections import defaultdict

def dictify():
    dct = defaultdict(dict)
    for (x, y), g in df.groupby(['Groups', 'sub-groups']):
        dct[x][y] = [*g['selections']]
    return dict(dct)

# dictify()
{
    "sg1": {
        "csg1": ["sc1","sc2"],
        "csg2": ["sc3","sc4"]
    },
    "sg2": {
        "csg3": ["sc5","sc6"],
        "csg4": ["sc7","sc8"]
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.