0

I want to export a Pandas df to a nested JSON for ingestion in Mongodb.

Here's an example of the data:

data = {
    'product_id': ['a001','a001','a001'],
    'product': ['aluminium','aluminium','aluminium'],
    'production_id': ['b001','b002','b002'],
    'production_name': ['metallurgical','recycle','recycle'],
    'geo_name': ['US','EU','RoW'],
    'value': [100, 200 ,200]
}
df = pd.DataFrame(data=data)
product_id product production_id production_name geo_name value
a001 aluminium b001 metallurgical US 100
a001 aluminium b002 recycle EU 200
a001 aluminium b002 recycle RoW 200

and this is what the final JSON should look like:

{
    "name_id": "a001",
    "name": "aluminium",
    "activities": [
        {
            "product_id": "b001"
            "product_name": "metallurgical",
            "regions": [
                {
                    "geo_name": "US",
                    "value": 100
                }
            ]
        },
        {
            "product_id": "b002"
            "product_name": "recycle",
            "regions": [
                {
                    "geo_name": "EU",
                    "value": 200
                },
                {
                    "geo_name": "RoW",
                    "value": 200
                }
            ]
        }
    ]
}

There are some questions that are close to my problem but they are either years old, and refer to an older version of Pandas for which the solutions break, or do not fully work the way I would like the json to be grouped and nested (this for example is single level How to create a nested JSON from pandas DataFrame?).

Some help would be really appreciated.

1
  • 1
    Hello, what have you tried so far ? Could you link us to the existing - even though unsatisfying for your case - solutions ? Commented Apr 1, 2021 at 15:53

1 Answer 1

0

I found the easiest solution that can work for an infinite number of nesting (2 in this example):

json_extract = df\
    .groupby(['product_id','product', 'production_id','production_name'])\
    .apply(lambda x: x[['geo_name','value']].to_dict('records'))\
    .reset_index(name='geos')\
    .groupby(['product_id','product'])\
    .apply(lambda x: x[['production_id','production_name', 'geos']].to_dict('records'))\
    .reset_index(name='production')\
    .to_json(orient='records')
[
    {
        "product_id": "a001",
        "product": "aluminium",
        "production": [
            {
                "production_id": "b001",
                "production_name": "metallurgical",
                "geos": [
                    {
                        "geo_name": "US",
                        "value": 100
                    }
                ]
            },
            {
                "production_id": "b002",
                "production_name": "recycle",
                "geos": [
                    {
                        "geo_name": "EU",
                        "value": 200
                    },
                    {
                        "geo_name": "RoW",
                        "value": 200
                    }
                ]
            }
        ]
    }
]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.