Nested JSON Converting Rows

Question

Here are the 3 rows of my sample json.

{"customer": 10, "date": "2017.04.06 12:09:32", "itemList": [{"item": "20126907_EA", "price": 1.88, "quantity": 1.0}, {"item": "20185742_EA", "price": 0.99, "quantity": 1.0}, {"item": "20138681_EA", "price": 1.79, "quantity": 1.0}, {"item": "20049778001_EA", "price": 2.47, "quantity": 1.0}, {"item": "20419715007_EA", "price": 3.33, "quantity": 1.0}, {"item": "20321434_EA", "price": 2.47, "quantity": 1.0}, {"item": "20068076_KG", "price": 28.24, "quantity": 10.086}, {"item": "20022893002_EA", "price": 1.77, "quantity": 1.0}, {"item": "20299328003_EA", "price": 1.25, "quantity": 1.0}], "store": "825f9cd5f0390bc77c1fed3c94885c87"}
{"customer": 100, "date": "2017.01.10 12:59:09", "itemList": [{"item": "20132638_KG", "price": 3.33, "quantity": 0.28}, {"item": "20320042001_EA", "price": 2.99, "quantity": 1.0}, {"item": "20320832003_EA", "price": 2.58, "quantity": 2.0}, {"item": "20128148_KG", "price": 4.85, "quantity": 0.256}, {"item": "20027478_KG", "price": 4.58, "quantity": 0.135}, {"item": "20653232_EA", "price": 5.99, "quantity": 1.0}, {"item": "20317755_EA", "price": 3.69, "quantity": 1.0}, {"item": "20519704_KG", "price": 4.24, "quantity": 0.214}, {"item": "20591843_KG", "price": 5.56, "quantity": 0.286}], "store": "a666587afda6e89aec274a3657558a27"}
{"customer": 1000, "date": "2017.04.17 18:53:40", "itemList": [{"item": "20788909_EA", "price": 3.49, "quantity": 1.0}, {"item": "20975073_EA", "price": 5.0, "quantity": 1.0}, {"item": "20868904_EA", "price": 5.0, "quantity": 1.0}, {"item": "20189092_EA", "price": 0.05, "quantity": 1.0}], "store": "ebb71045453f38676c40deb9864f811d"}

I would like to convert every single tag into rows with the nested tag, below is the code. I'm trying while I am facing issues :

def data_load():
    p=Path(r'C:\Users\rohgorthy\Downloads\LBD_Assignemtn\sample_tag.json')
    with p.open('r', encoding='utf-8') as f:
        data = f.read()
    
    df = pd.json_normalize(data, record_path='itemList', meta=['customer', 'date', 'store'])
    return df

Error below:

result = result[spec]
TypeError: string indices must be integers

Can any one please help me to achieve the below format :

df Columns:

customer date item price quantity store

Thank you in advance.

Ben Y · Accepted Answer · 2021-06-11 04:35:25Z

1

I think you need to take the actual raw strings of JSON data and convert them into a list of objects (dicts).

from pathlib import Path
from json import loads
from pandas import json_normalize

def data_load(p):
    p = Path(p) if not isinstance(p, Path) else p
    text = p.read_text(encoding='utf-8')
    data = [loads(ln) for ln in text.splitlines()]
    return json_normalize(data, record_path='itemList', meta=['customer', 'date', 'store'])

df = data_load('sample_tag.json')

edited Jun 11, 2021 at 4:35

answered Jun 11, 2021 at 4:11

Ben Y

1,0438 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Rohith Amaz Over a year ago

Excellent, This Works , with one correction data = [json.loads(ln) for ln in text.splitlines()] ....

Ben Y Over a year ago

oops was operating from memory, sorry. I fixed the answer

Rohith Amaz Over a year ago

Thank you so much ! It was super kind of you much appreciate your answer!!! Wish you have a wonderful day ahead!

Rohith Amaz · Accepted Answer · 2021-06-15 18:59:26Z

0

I used Pyspark to get through the solution: below is the code

def data_load():
df=spark.read.json(r"transactions.json")
df.createTempView("df")
df2=spark.sql("select customer,date,explode(itemlist) as item_list ,store from df")
df2.createTempView("df2")
df3=spark.sql("select customer,date,item_list.item as item,item_list.price as price,item_list.quantity as quantity,store as store from df2")
df3.createTempView("d13")
df4=spark.sql("select a.item as itema,b.item as itemb, count(*) as cnt from d13 a join d13 b on a.customer=b.customer and a.item<b.item and a.date=b.date group by a.item,b.item ")
return df4

answered Jun 15, 2021 at 18:59

Rohith Amaz

136 bronze badges

Collectives™ on Stack Overflow

Nested JSON Converting Rows

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related