2

I would like to understand if I can convert multi-index and multi-header DataFrame to a nested dict.

I have the following DataFrame:

                                                             revenue                           taxes
                                                                 sum       max      min         mean       sum
company              type
-                    -                                        15.00     15.00    15.00         0.00      0.00
Facebook             -                                      1259.79    218.50  -120.00         5.17    321.00
                     City TOT                                  0.00      0.00     0.00         4.00      4.00
                     Country TOT                               0.00      0.00     0.00         4.00      4.00
                     Sales                                     0.00      0.00     0.00         9.25     18.50
Google               %tax_1                                    0.00      0.00     0.00         0.89      3.58
                     -                                      3738.36   2643.08  -100.00        96.23  26369.57
                     City TOT                                  0.00      0.00     0.00         3.55     95.99
                     Country TOT                               0.00      0.00     0.00        23.25    628.00
...

df = pd.DataFrame.from_dict({('-', '-'): {('revenue', 'sum'): 15.0,
  ('revenue', 'max'): 15.0,
  ('revenue', 'min'): 15.0,
  ('taxes', 'mean'): 0.00,
  ('taxes', 'sum'): 0.0},
 ('Facebook', '-'): {('revenue', 'sum'): 1259.79,
  ('revenue', 'max'): 218.5,
  ('revenue', 'min'): -120.0,
  ('taxes', 'mean'): 5.17,
  ('taxes', 'sum'): 321.0},
 ('Facebook', 'City TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 4.00,
  ('taxes', 'sum'): 4.0},
 ('Facebook', 'Country TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 4.00,
  ('taxes', 'sum'): 4.0},
 ('Facebook', 'Sales'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 9.25,
  ('taxes', 'sum'): 18.5},
 ('Google', '%tax_1'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 0.89,
  ('taxes', 'sum'): 3.58},
 ('Google', '-'): {('revenue', 'sum'): 3738.36,
  ('revenue', 'max'): 2643.08,
  ('revenue', 'min'): -100.0,
  ('taxes', 'mean'): 96.23,
  ('taxes', 'sum'): 26369.57},
 ('Google', 'City TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 3.55,
  ('taxes', 'sum'): 95.99},
 ('Google', 'Country TOT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 23.25,
  ('taxes', 'sum'): 628.0},
 ('Google', 'PER GETS'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 0.88,
  ('taxes', 'sum'): 2.64},
 ('Google', 'Sales'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 12.61,
  ('taxes', 'sum'): 138.75},
 ('Google', 'VAT'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 11.70,
  ('taxes', 'sum'): 1065.51},
 ('Google', 'per room_1'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 5.00,
  ('taxes', 'sum'): 20.0},
 ('Google', 'tax on top'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 8.68,
  ('taxes', 'sum'): 78.2},
 ('Google', 'tax per reserv'): {('revenue', 'sum'): 0.0,
  ('revenue', 'max'): 0.0,
  ('revenue', 'min'): 0.0,
  ('taxes', 'mean'): 20.00,
  ('taxes', 'sum'): 40.0}}, orient='index')
df.index.names = ['company', 'type']

The columns and index are MultiIndex:

MultiIndex([('revenue',  'sum'),
            ('revenue',  'max'),
            ('revenue',  'min'),
            ( 'taxes', 'mean'),
            ( 'taxes',  'sum')],)
MultiIndex([(                  '-',                                '-'),
            (           'Facebook',                                '-'),
            (           'Facebook',                         'City TOT'),
            (           'Facebook',                      'Country TOT'),
            (           'Facebook',                            'Sales'),
            (           'Google',                           '%tax_1'),
            ...

I have tried to use this method (based on Pandas multi index dataframe to nested dictionary):

{level: df.xs(level).to_dict('index') for level in df.index.levels[0]}

However, this is what I get:

{'-': {('revenue', 'sum'): {'-': 15.0},
  ('revenue', 'max'): {'-': 15.0},
  ('revenue', 'min'): {'-': 15.0},
  ('taxes', 'mean'): {'-': 0.0)},
  ('taxes', 'sum'): {'-': 0.0}},
 'Facebook': {('revenue', 'sum'): {'-': 1259.79,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ('revenue', 'max'): {'-': 218.5,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ('revenue', 'min'): {'-': -120.0,
   'City TOT': 0.0,
   'Country TOT': 0.0,
   'Sales': 0.0},
  ...

What I expect to achieve instead is a nested dict on the columns (exactly as it is on the index), where the data gets nested per the indexes like this:

{
    '-': {
        '-':  {
            'revenue': {
                'sum': 15.0,
                'max': 15.0,
                'min': 15.0,
            },
            'taxes': {
                'mean': 0.00,
                'sum': 0.00,
            }
        },
    },
    'Facebook': {
        '-': {
            'revenue': {
                'sum': 1259.79,
                'max': 218.5,
                'min': -120.0
            }
            'taxes': ...
        },
        'City TOT': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
        'Country TOT': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
        'Sales': {
            'revenue': {
                'sum': 0.0,
                'max': 0.0,
                'min': 0.0
            }
            'taxes': ...
        },
    } 
}

Any way I can get around with this? Thanks!

2
  • can you add a statement to create your dataframe? creating multi index from copy and paste is a pain Commented Jan 4, 2022 at 17:21
  • 1
    @Umar.H Added. I should have added it from the start. I'm sorry. Thanks. Commented Jan 4, 2022 at 18:33

1 Answer 1

2

In 2 steps:

  1. Pivot your dataframe and convert it to a flat dict.
  2. Convert tuple keys to a nested dict

Step 1

data = df.stack(level=[0, 1]).to_dict()
print(data)

# Output
{('-', '-', 'revenue', 'max'): 15.0,
 ('-', '-', 'revenue', 'min'): 15.0,
 ('-', '-', 'revenue', 'sum'): 15.0,
 ...
 ('Google', 'Country TOT', 'revenue', 'sum'): 0.0,
 ('Google', 'Country TOT', 'taxes', 'mean'): 23.25,
 ('Google', 'Country TOT', 'taxes', 'sum'): 628.0}

Step 2

d = {}
for t, v in data.items():
    e = d.setdefault(t[0], {})
    for k in t[1:-1]:
        e = e.setdefault(k, {})
    e[t[-1]] = v
print(d)

# Output
{'-': {'-': {'revenue': {'max': 15.0, 'min': 15.0, 'sum': 15.0},
   'taxes': {'mean': 0.0, 'sum': 0.0}}},
 'Facebook': {'-': {'revenue': {'max': 218.5, 'min': -120.0, 'sum': 1259.79},
   'taxes': {'mean': 5.17, 'sum': 321.0}},
  'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 4.0, 'sum': 4.0}},
  'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 4.0, 'sum': 4.0}},
  'Sales': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 9.25, 'sum': 18.5}}},
 'Google': {'%tax_1': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 0.89, 'sum': 3.58}},
  '-': {'revenue': {'max': 2643.08, 'min': -100.0, 'sum': 3738.36},
   'taxes': {'mean': 96.23, 'sum': 26369.57}},
  'City TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 3.55, 'sum': 95.99}},
  'Country TOT': {'revenue': {'max': 0.0, 'min': 0.0, 'sum': 0.0},
   'taxes': {'mean': 23.25, 'sum': 628.0}}}}

Pretty printing:

import json
print(json.dumps(d, indent=4))

# Output
{
    "-": {
        "-": {
            "revenue": {
                "max": 15.0,
                "min": 15.0,
                "sum": 15.0
            },
            "taxes": {
                "mean": 0.0,
                "sum": 0.0
            }
        }
    },
    "Facebook": {
        "-": {
            "revenue": {
                "max": 218.5,
                "min": -120.0,
                "sum": 1259.79
            },
            "taxes": {
                "mean": 5.17,
                "sum": 321.0
            }
        },
        "City TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 4.0,
                "sum": 4.0
            }
        },
        "Country TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 4.0,
                "sum": 4.0
            }
        },
        "Sales": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 9.25,
                "sum": 18.5
            }
        }
    },
    "Google": {
        "%tax_1": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 0.89,
                "sum": 3.58
            }
        },
        "-": {
            "revenue": {
                "max": 2643.08,
                "min": -100.0,
                "sum": 3738.36
            },
            "taxes": {
                "mean": 96.23,
                "sum": 26369.57
            }
        },
        "City TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 3.55,
                "sum": 95.99
            }
        },
        "Country TOT": {
            "revenue": {
                "max": 0.0,
                "min": 0.0,
                "sum": 0.0
            },
            "taxes": {
                "mean": 23.25,
                "sum": 628.0
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.