Convert pandas dataframe to specific Json format in python

Question

I have a large dataset with 2000+ rows, I want to convert it into Specific Json format. I have tried this code on a sample dataset.

I tried using to_json, to_dict but it gives output in generic format.

import pandas as pd  
from collections import defaultdict  

data = [['food', 'vegatables', 10], ['food', 'fruits', 5], ['food', 'pulses', 12], ['cloth', 'shirts',2], ['cloth', 'trousers', 6], ['books', 'notebook', 3], ['pens', 'roller', 4], ['pens', 'ball', 3]]  

df = pd.DataFrame(data, columns = ['Items', 'Subitem', 'Quantity']) 

labels = defaultdict(int)
labels1 = defaultdict(int)
for cat in df["Items"]:
   labels[cat] += 1
for sub in df["Subitem"]:
   labels1[sub] += 1

check = [{"item": i, "weight": labels[i], 'groups':[{"subitem":j, "weight": labels1[j], "group" : [] } for j in labels1] } for i in labels]
check

I am getting an output like this

[{'item': 'food',
 'weight': 3,
 'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
  {'subitem': 'fruits', 'weight': 1, 'group': []},
  {'subitem': 'pulses', 'weight': 1, 'group': []},
  {'subitem': 'shirts', 'weight': 1, 'group': []},
  {'subitem': 'trousers', 'weight': 1, 'group': []},
  {'subitem': 'notebook', 'weight': 1, 'group': []},
  {'subitem': 'roller', 'weight': 1, 'group': []},
  {'subitem': 'ball', 'weight': 1, 'group': []}]},
  {'item': 'cloth',
  'weight': 2,
  'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
  {'subitem': 'fruits', 'weight': 1, 'group': []},
  {'subitem': 'pulses', 'weight': 1, 'group': []},
  {'subitem': 'shirts', 'weight': 1, 'group': []},
  {'subitem': 'trousers', 'weight': 1, 'group': []},
  {'subitem': 'notebook', 'weight': 1, 'group': []},
  {'subitem': 'roller', 'weight': 1, 'group': []},
  {'subitem': 'ball', 'weight': 1, 'group': []}]},
  {'item': 'books',
  'weight': 1,
  'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
  {'subitem': 'fruits', 'weight': 1, 'group': []},
  {'subitem': 'pulses', 'weight': 1, 'group': []},
  {'subitem': 'shirts', 'weight': 1, 'group': []},
  {'subitem': 'trousers', 'weight': 1, 'group': []},
  {'subitem': 'notebook', 'weight': 1, 'group': []},
  {'subitem': 'roller', 'weight': 1, 'group': []},
  {'subitem': 'ball', 'weight': 1, 'group': []}]},
  {'item': 'pens',
  'weight': 2,
  'groups': [{'subitem': 'vegatables', 'weight': 1, 'group': []},
  {'subitem': 'fruits', 'weight': 1, 'group': []},
  {'subitem': 'pulses', 'weight': 1, 'group': []},
  {'subitem': 'shirts', 'weight': 1, 'group': []},
  {'subitem': 'trousers', 'weight': 1, 'group': []},
  {'subitem': 'notebook', 'weight': 1, 'group': []},
  {'subitem': 'roller', 'weight': 1, 'group': []},
  {'subitem': 'ball', 'weight': 1, 'group': []}]}]

But I want an output which has only Subitems which are related to that item

[{'item': 'food',
'weight': 3,
'groups': [
{'subitem': 'vegatables', 'weight': 10, 'group': []},
{'subitem': 'fruits', 'weight': 5, 'group': []},
{'subitem': 'pulses', 'weight': 12, 'group': []}]},
{'item': 'cloth',
'weight': 2,
'groups': [
{'subitem': 'shirts', 'weight': 2, 'group': []},
{'subitem': 'trousers', 'weight': 6, 'group': []}]},
{'item': 'books',
'weight': 1,
'groups': [
{'subitem': 'notebook', 'weight': 3, 'group': []}]},
{'item': 'pens',
'weight': 2,
'groups': [
{'subitem': 'roller', 'weight': 4, 'group': []},
{'subitem': 'ball', 'weight': 3, 'group': []}]}]

And what should be done if a want an output like this(where weight of Item is cumulative of weights of subitem).

[{'item': 'food',
'weight': 27,
'groups': [
{'subitem': 'vegatables', 'weight': 10, 'group': []},
{'subitem': 'fruits', 'weight': 5, 'group': []},
{'subitem': 'pulses', 'weight': 12, 'group': []}]},
{'item': 'cloth',
'weight': 8,
'groups': [
{'subitem': 'shirts', 'weight': 2, 'group': []},
{'subitem': 'trousers', 'weight': 6, 'group': []}]},
{'item': 'books',
'weight': 3,
'groups': [
{'subitem': 'notebook', 'weight': 3, 'group': []}]},
{'item': 'pens',
'weight': 7,
'groups': [
{'subitem': 'roller', 'weight': 4, 'group': []},
{'subitem': 'ball', 'weight': 3, 'group': []}]}]

ansev · Accepted Answer · 2020-04-01 12:20:05Z

1

You could use DataFrame.groupby and DataFrame.to_dict with list comprehension

cols_group = ['Subitem', 'Weight', 'group']

my_list = [{'Item' : item,
            'Weight': len(group),
            'group': group[cols_group].to_dict('records')}

           for item, group in (df.rename(columns = {'Quantity' : 'Weight'})
                                 .assign(group = [[]] * len(df))
                                 .groupby('Items'))]

print(my_list)

Output

 [{'Item': 'books',
  'Weight': 1,
  'groups': [{'Subitem': 'notebook', 'Weight': 3, 'group': []}]},
 {'Item': 'cloth',
  'Weight': 2,
  'groups': [{'Subitem': 'shirts', 'Weight': 2, 'group': []},
   {'Subitem': 'trousers', 'Weight': 6, 'group': []}]},
 {'Item': 'food',
  'Weight': 3,
  'groups': [{'Subitem': 'vegatables', 'Weight': 10, 'group': []},
   {'Subitem': 'fruits', 'Weight': 5, 'group': []},
   {'Subitem': 'pulses', 'Weight': 12, 'group': []}]},
 {'Item': 'pens',
  'Weight': 2,
  'groups': [{'Subitem': 'roller', 'Weight': 4, 'group': []},
   {'Subitem': 'ball', 'Weight': 3, 'group': []}]}]

edited Apr 1, 2020 at 12:20

answered Apr 1, 2020 at 12:03

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

AdityaV Over a year ago

Thanks! it worked, what should be done if I need weight of Item as cumulative of Subitems(like cloth 'weight' : 8 instead of 2, food 'weight': 27 instead of 3) and all other remaining same.

ansev Over a year ago

use :

df.rename(columns = {'Quantity' : 'Weight'})                                  .assign(group = [[]] * len(df), Weight = df['Quantity'].cumsum() )                                  .groupby('Items'))

or create a new column

df.rename(columns = {'Quantity' : 'Weight'})                                  .assign(group = [[]] * len(df), Weight_cumsum = df['Quantity'].cumsum() )                                  .groupby('Items'))

: with cols_group = ['Subitem', 'Weight', 'Weight_cumsum', 'group']

AdityaV Over a year ago

It is giving cumulative sum at Subitem, not at Item.

Collectives™ on Stack Overflow

Convert pandas dataframe to specific Json format in python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related