From nested dictionary to python Dataframe

Question

I have a sample of a nested Dictionary that looks like this:

data =  [{
         'resultInfo': {
             'load': None,
             'unload': {
                 'weight': 59.0,
                 'unit': 'ton',
                 'tonsPerTeu': None,
                 'tonsPerFeu': None,
                 'freightId': None,
                 'showEmissionsAtResponse': True
             },
             'location': 'zip:63937',
             'freightId': None,
             'emissionPercentage': 1.0,
             'directDistance': 767.71
         },
         'emissions': {
             'primaryEnergy': {
                 'rail': None,
                 'sea': None,
                 'air': None,
                 'inlandWaterways': None,
                 'road': {
                     '_value_1': Decimal('70351.631210000000'),
                     'wellToTank': Decimal('13412'),
                     'tankToWheel': Decimal('56939')
                 },
                 'logisticsite': None,
                 'transfer': None,
                 'unit': 'MegaJoule'
             },
             'carbonDioxide': {
                 'rail': None,
                 'sea': None,
                 'air': None,
                 'inlandWaterways': None,
                 'road': {
                     '_value_1': Decimal('4.866239643000'),
                     'wellToTank': Decimal('0.902'),
                     'tankToWheel': Decimal('3.963')
                 }
    }]

The type(data) is a list.

I want to have it on a dataframe format so that expected output is this:

primaryEnergy_wellToTank    primaryEnergy_tankToWheel   carbonDioxide_wellToTank    carbonDioxide_tankToWheel
                   13412                        56939                      0.902                        3.963

I tried some transformation from the pd.Dataframe function:

df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in mydict.items() ]))df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in mydict.items() ]))

but the result is not really successful so far.

How this could be done ?

Below is the error i am getting when i use df = pd.json_normalize(data)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\zeep\xsd\valueobjects.py in __getattribute__(self, key)
    142         try:
--> 143             return self.__values__[key]
    144         except KeyError:

KeyError: 'values'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<ipython-input-180-cc2694b5448e> in <module>
----> 1 df = pd.json_normalize(result.result)

~\AppData\Roaming\Python\Python37\site-packages\pandas\io\json\_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level)
    272 
    273     if record_path is None:
--> 274         if any([isinstance(x, dict) for x in y.values()] for y in data):
    275             # naive normalization, this is idempotent for flat records
    276             # and potentially will inflate the data considerably for

~\AppData\Roaming\Python\Python37\site-packages\pandas\io\json\_normalize.py in <genexpr>(.0)
    272 
    273     if record_path is None:
--> 274         if any([isinstance(x, dict) for x in y.values()] for y in data):
    275             # naive normalization, this is idempotent for flat records
    276             # and potentially will inflate the data considerably for

~\AppData\Local\Continuum\anaconda3\lib\site-packages\zeep\xsd\valueobjects.py in __getattribute__(self, key)
    144         except KeyError:
    145             raise AttributeError(
--> 146                 "%s instance has no attribute '%s'" % (self.__class__.__name__, key)
    147             )
    148 

AttributeError: DistributionLoadResult instance has no attribute 'values'

I could solve the issue by using the serialize_object function.

Trenton McKinney · Accepted Answer · 2020-07-02 11:01:17Z

If the list looks like the list of dicts at the bottom of this post, where resultInfo repeats, then you can use json_normalize
Once you've created df drop the unneeded columns with pandas.DataFrame.drop

import pandas as pd

df = pd.json_normalize(data)

# display(df)
  resultInfo.load  resultInfo.unload.weight resultInfo.unload.unit resultInfo.unload.tonsPerTeu resultInfo.unload.tonsPerFeu resultInfo.unload.freightId  resultInfo.unload.showEmissionsAtResponse resultInfo.location resultInfo.freightId  resultInfo.emissionPercentage  resultInfo.directDistance emissions.primaryEnergy.rail emissions.primaryEnergy.sea emissions.primaryEnergy.air emissions.primaryEnergy.inlandWaterways  emissions.primaryEnergy.road._value_1  emissions.primaryEnergy.road.wellToTank  emissions.primaryEnergy.road.tankToWheel emissions.primaryEnergy.logisticsite emissions.primaryEnergy.transfer emissions.primaryEnergy.unit emissions.carbonDioxide.rail emissions.carbonDioxide.sea emissions.carbonDioxide.air emissions.carbonDioxide.inlandWaterways  emissions.carbonDioxide.road._value_1  emissions.carbonDioxide.road.wellToTank  emissions.carbonDioxide.road.tankToWheel
0            None                      59.0                    ton                         None                         None                        None                                       True           zip:63937                 None                            1.0                     767.71                         None                        None                        None                                    None                            70351.63121                                    13412                                      5693                                 None                             None                    MegaJoule                         None                        None                        None                                    None                                4.86624                                    0.902                                      3.96
1            None                      59.0                    ton                         None                         None                        None                                       True           zip:63937                 None                            1.0                     767.71                         None                        None                        None                                    None                            70351.63121                                    13412                                      5693                                 None                             None                    MegaJoule                         None                        None                        None                                    None                                4.86624                                    0.902                                      3.96
2            None                      59.0                    ton                         None                         None                        None                                       True           zip:63937                 None                            1.0                     767.71                         None                        None                        None                                    None                            70351.63121                                    13412                                      5693                                 None                             None                    MegaJoule                         None                        None                        None                                    None                                4.86624                                    0.902                                      3.96

Data

data = [{
        'resultInfo': {
            'load': None,
            'unload': {
                'weight': 59.0,
                'unit': 'ton',
                'tonsPerTeu': None,
                'tonsPerFeu': None,
                'freightId': None,
                'showEmissionsAtResponse': True
            },
            'location': 'zip:63937',
            'freightId': None,
            'emissionPercentage': 1.0,
            'directDistance': 767.71
        },
        'emissions': {
            'primaryEnergy': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 70351.631210000000,
                    'wellToTank': 13412,
                    'tankToWheel': 5693
                },
                'logisticsite': None,
                'transfer': None,
                'unit': 'MegaJoule'
            },
            'carbonDioxide': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 4.866239643000,
                    'wellToTank': 0.902,
                    'tankToWheel': 3.96
                },
            }
        }
    },
    {
        'resultInfo': {
            'load': None,
            'unload': {
                'weight': 59.0,
                'unit': 'ton',
                'tonsPerTeu': None,
                'tonsPerFeu': None,
                'freightId': None,
                'showEmissionsAtResponse': True
            },
            'location': 'zip:63937',
            'freightId': None,
            'emissionPercentage': 1.0,
            'directDistance': 767.71
        },
        'emissions': {
            'primaryEnergy': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 70351.631210000000,
                    'wellToTank': 13412,
                    'tankToWheel': 5693
                },
                'logisticsite': None,
                'transfer': None,
                'unit': 'MegaJoule'
            },
            'carbonDioxide': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 4.866239643000,
                    'wellToTank': 0.902,
                    'tankToWheel': 3.96
                },
            }
        }
    },
    {
        'resultInfo': {
            'load': None,
            'unload': {
                'weight': 59.0,
                'unit': 'ton',
                'tonsPerTeu': None,
                'tonsPerFeu': None,
                'freightId': None,
                'showEmissionsAtResponse': True
            },
            'location': 'zip:63937',
            'freightId': None,
            'emissionPercentage': 1.0,
            'directDistance': 767.71
        },
        'emissions': {
            'primaryEnergy': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 70351.631210000000,
                    'wellToTank': 13412,
                    'tankToWheel': 5693
                },
                'logisticsite': None,
                'transfer': None,
                'unit': 'MegaJoule'
            },
            'carbonDioxide': {
                'rail': None,
                'sea': None,
                'air': None,
                'inlandWaterways': None,
                'road': {
                    '_value_1': 4.866239643000,
                    'wellToTank': 0.902,
                    'tankToWheel': 3.96
                },
            }
        }
    }
]

Collectives™ on Stack Overflow

From nested dictionary to python Dataframe

1 Answer 1

Data

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Data

Comments

Your Answer

Sign up or log in

Post as a guest

Related