0

I am trying to generate a nested JSON from a DataFrame, where attributes of a car are distributed in several rows.

DataFrame

cars = {'brand': ['Honda','Toyota','Ford','Audi','Honda','Toyota','Ford','Audi'],
        'model': ['Civic','Corolla','Focus','A4','Civic','Corolla','Focus','A4'],
        'attributeName': ['color','color','color','color','doors','doors','doors','doors'],
        'attributeValue': ['red','blue','black','red',2,4,4,2]
        }

df = pd.DataFrame(cars) 

What I tried

At first I grouped the rows and tried to apply the nesting:

df.groupby(['brand','model'])\
             .apply(lambda x: x[['attributeName','attributeValue']].to_dict('records'))\
             .to_json(orient='records')

Result

[[{"attributeName":"color","attributeValue":"red"},{"attributeName":"doors","attributeValue":2}],[{"attributeName":"color","attributeValue":"black"},{"attributeName":"doors","attributeValue":4}],[{"attributeName":"color","attributeValue":"red"},{"attributeName":"doors","attributeValue":2}],[{"attributeName":"color","attributeValue":"blue"},{"attributeName":"doors","attributeValue":4}]]

Expected result

[
    {
        'brand':'Honda',
        'model':'Civic',
        'attributes':[
            {
                'name':'color',
                'value':'red'
            }
        ]
    },
    {...}
]

So what can I do to get also the other records and not only the attributes?

2 Answers 2

2

In your solution is added rename with reset_index():

d = {'attributeName':'name','attributeValue':'value'}
j = df.rename(columns=d).groupby(['brand','model']).apply(lambda x: x[['name','value']].to_dict('records')).reset_index(name='attributes').to_json(orient='records')
print (j)
[{"brand":"Audi","model":"A4","attributes":[{"name":"color","value":"red"},{"name":"doors","value":2}]},{"brand":"Ford","model":"Focus","attributes":[{"name":"color","value":"black"},{"name":"doors","value":4}]},{"brand":"Honda","model":"Civic","attributes":[{"name":"color","value":"red"},{"name":"doors","value":2}]},{"brand":"Toyota","model":"Corolla","attributes":[{"name":"color","value":"blue"},{"name":"doors","value":4}]}]

Or:

d = {'attributeName':'name','attributeValue':'value'}
j = df.rename(columns=d).groupby(['brand','model']).apply(lambda x: x[['name','value']].to_dict('records')).explode().apply(lambda x: [x]).reset_index(name='attributes').to_json(orient='records')
print (j)
[{"brand":"Audi","model":"A4","attributes":[{"name":"color","value":"red"}]},{"brand":"Audi","model":"A4","attributes":[{"name":"doors","value":2}]},{"brand":"Ford","model":"Focus","attributes":[{"name":"color","value":"black"}]},{"brand":"Ford","model":"Focus","attributes":[{"name":"doors","value":4}]},{"brand":"Honda","model":"Civic","attributes":[{"name":"color","value":"red"}]},{"brand":"Honda","model":"Civic","attributes":[{"name":"doors","value":2}]},{"brand":"Toyota","model":"Corolla","attributes":[{"name":"color","value":"blue"}]},{"brand":"Toyota","model":"Corolla","attributes":[{"name":"doors","value":4}]}]

df['attributes'] = df.apply(lambda x: [{'name': x['attributeName'], 'value': x['attributeValue']}], axis=1)
df = df.drop(['attributeName','attributeValue'], axis=1)
print (df)
    brand    model                             attributes
0   Honda    Civic    [{'name': 'color', 'value': 'red'}]
1  Toyota  Corolla   [{'name': 'color', 'value': 'blue'}]
2    Ford    Focus  [{'name': 'color', 'value': 'black'}]
3    Audi       A4    [{'name': 'color', 'value': 'red'}]
4   Honda    Civic        [{'name': 'doors', 'value': 2}]
5  Toyota  Corolla        [{'name': 'doors', 'value': 4}]
6    Ford    Focus        [{'name': 'doors', 'value': 4}]
7    Audi       A4        [{'name': 'doors', 'value': 2}]

j = df.to_json(orient='records')
print (j)
[{"brand":"Honda","model":"Civic","attributes":[{"name":"color","value":"red"}]},{"brand":"Toyota","model":"Corolla","attributes":[{"name":"color","value":"blue"}]},{"brand":"Ford","model":"Focus","attributes":[{"name":"color","value":"black"}]},{"brand":"Audi","model":"A4","attributes":[{"name":"color","value":"red"}]},{"brand":"Honda","model":"Civic","attributes":[{"name":"doors","value":2}]},{"brand":"Toyota","model":"Corolla","attributes":[{"name":"doors","value":4}]},{"brand":"Ford","model":"Focus","attributes":[{"name":"doors","value":4}]},{"brand":"Audi","model":"A4","attributes":[{"name":"doors","value":2}]}]
Sign up to request clarification or add additional context in comments.

2 Comments

That looks great and is working - Going for the first option, cause I am not sure how the explode() works in the moment, but will research it more closely.
@HedgeHog - I was a bit not sure, how looks expcted ouput, first solution get different ouput like secnd and third.
0

This should give you the desired output:

d = {'attributeName':'name','attributeValue':'value'}
df_cars= (df.rename(columns=d).groupby(['brand','model']).apply(lambda x:x[['name','value']].drop_duplicates().to_dict('records')).reset_index().rename(columns={0:'attributes'}))

df_cars.head(10)

1 Comment

Please mark code as such by using code fencing

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.