0

(Python 3.5) I am trying to parse a large user review.json file (1.3gb) into python and convert to a .csv file. I have tried looking for a simple converter tool online, most of which accept a file size maximum of 1Mb or are super expensive. as i am fairly new to python i guess i ask 2 questions.

  1. is it even possible/ efficient to do so or should i be looking for another method?

  2. I tried the following code, it only is reading the and writing the top 342 lines in my .json doc then returning an error.

Blockquote File "C:\Anaconda3\lib\json__init__.py", line 319, in loads return _default_decoder.decode(s)

File "C:\Anaconda3\lib\json\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) JSONDecodeError: Extra data

This is the code im using

import csv
import json

infile = open("myfile.json","r")
outfile = open ("myfile.csv","w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
  writer.writerow(row)

my .json example:

Link To small part of Json

My thoughts is its some type of error related to my for loop, with json.loads... but i do not know enough about it. Is it possible to create a dictionary{} and take convert just the values "user_id", "stars", "text"? or am i dreaming.

Any suggestions or criticism are appreciated.

2 Answers 2

1

This is not a JSON file; this is a file containing individual lines of JSON. You should parse each line individually.

for row in infile:
  data = json.loads(row)
  writer.writerow(data)
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks very much for the reply Daniel. However the result of this .csv file that is created contains only Keys not values.(user_id,stars,type,review_id,business_id,votes,date,text) There a way i can add the Values to the keys {key:value}...? Should i try using a dictionary {}? As i only need the values for "user_id", "stars", "text"
A CSV is not a key-value structure. You'll need to pick out the individual bits of data you need; for example data_to_write = [data["votes"]["funny"], data["user_id"], data["text"]] etc.
0

Sometimes it's not as easy as having one JSON definition per line of input. A JSON definition can spread out over multiple lines, and it's not necessarily easy to determine which are the start and end braces reading line by line (for example, if there are strings containing braces, or nested structures).

The answer is to use the raw_decode method of json.JSONDecoder to fetch the JSON definitions from the file one at a time. This will work for any set of concatenated valid JSON definitions. It's further described in my answer here: Importing wrongly concatenated JSONs in python

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.