0

I'm having the following data in CSV format.

id,category,sub_category,sub_category_type,count
0,fruits,citrus,lemon,30
1,fruits,citrus,lemon,40
2,fruits,citrus,lemon,50
3,fruits,citrus,grapefruit,20
4,fruits,citrus,orange,40
5,fruits,citrus,orange,10
6,fruits,berries,blueberry,20
7,fruits,berries,strawberry,50
8,fruits,berries,strawberry,90
9,fruits,berries,cranberry,70
10,fruits,berries,raspberry,16
11,fruits,berries,raspberry,80
12,fruits,dried fruit,raisins,10
13,fruits,dried fruit,dates,15
14,fruits,dried fruit,dates,10
15,vegetables,legumes,beans,12
16,vegetables,legumes,beans,15
17,vegetables,legumes,chickpea,12
18,vegetables,green leaf,spinach,18
19,vegetables,green leaf,cress,19

I want to convert the above CSV format to nested JSON as pandas.DataFrame.to_json() donesn't help me in converting to nested JSON format.

Is there any solution for this?

PS: I'm answering the above question in Q&A style to share the knowledge. I would be happy to know if there is any other solution better than this.

3
  • Why are you answering your own question immediately upon asking it. Your response should form part of your original question. Commented May 26, 2016 at 5:57
  • 1
    then can you explain how am I supposed to answer this question? When i click the check box Answer your own questions - share your knowlwdge Q&A style, it opens up a text box to post the answer. If my response should be the part of my question, then why there has to be another text box to post the answer? Commented May 26, 2016 at 6:14
  • Alternatively, your response could be something that you have tried (which should be part of the question), but for which you are seeking a better solution for whatever reason. Commented May 26, 2016 at 7:05

1 Answer 1

0

The following code is inspired from this github link. This code will help us in converting CSV upto level 3 nested JSON

import pandas as pd
import json


df = pd.read_csv('data.csv')

# choose columns to keep, in the desired nested json hierarchical order
df = df[["category", "sub_category","sub_category_type", "count"]]

# order in the groupby here matters, it determines the json nesting
# the groupby call makes a pandas series by grouping "category", "sub_category" and"sub_category_type", 
#while summing the numerical column 'count'
df1 = df.groupby(["category", "sub_category","sub_category_type"])['count'].sum()
df1 = df1.reset_index()

print df1

d = dict()
d = {"name":"stock", "children": []}

for line in df1.values:
    category = line[0]
    sub_category = line[1]
    sub_category_type = line[2]
    count = line[3]

    # make a list of keys
    category_list = []
    for item in d['children']:
        category_list.append(item['name'])

    # if 'category' is NOT category_list, append it
    if not category in category_list:
        d['children'].append({"name":category, "children":[{"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]}]})

    # if 'category' IS in category_list, add a new child to it
    else:
        sub_list = []        
        for item in d['children'][category_list.index(category)]['children']:
            sub_list.append(item['name'])
        print sub_list

        if not sub_category in sub_list:
            d['children'][category_list.index(category)]['children'].append({"name":sub_category, "children":[{"name": sub_category_type, "count" : count}]})
        else:
            d['children'][category_list.index(category)]['children'][sub_list.index(sub_category)]['children'].append({"name": sub_category_type, "count" : count})


print json.dumps(d)

On execution,

{
"name": "stock", 
"children": [
    {"name": "fruits",
    "children": [
        {"name": "berries", 
        "children": [
            {"count": 20, "name": "blueberry"}, 
            {"count": 70, "name": "cranberry"}, 
            {"count": 96, "name": "raspberry"}, 
            {"count": 140, "name": "strawberry"}]
        },
        {"name": "citrus", 
        "children": [
            {"count": 20, "name": "grapefruit"},
            {"count": 120, "name": "lemon"},
            {"count": 50, "name": "orange"}]
        }, 
        {"name": "dried fruit",
        "children": [
            {"count": 25, "name": "dates"}, 
            {"count": 10, "name": "raisins"}]
        }]
    },
    {"name": "vegtables",
    "children": [
        {"name": "green leaf",
        "children": [
            {"count": 19, "name": "cress"},
            {"count": 18, "name": "spinach"}]
        },
        {
        "name": "legumes",
        "children": [
            {"count": 27, "name": "beans"},
            {"count": 12, "name": "chickpea"}]
        }]
    }]
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.