Python Converting JSON to CSV TypeError

Question

I am attempting to write a CSV file from JSON data that is coming from the PushShift API but am running into a TypeError. My code is below

import requests
import csv
import json
from urllib.request import urlopen

url = 'https://api.pushshift.io/reddit/comment/search/?subreddit=science&filter=parent_id,id,author,created_utc,subreddit,body,score,permalink'
page = requests.get(url)
page_json = json.loads(page.text)
print(page.text)
f = csv.writer(open("test.csv",'w+', newline=''))
f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
for x in page_json:
f.writerow([x["data"]["id"],
            x["data"]["parent_id"],
            x["data"]["author"],
            x["data"]["created_utc"],
            x["data"]["subreddit"],
            x["data"]["body"],
            x["data"]["score"]])

The error I am getting is this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-82784a93576b> in <module>()
 11 f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
 12 for x in page:
---> 13     f.writerow([x["data"]["id"],
     14                 x["data"]["parent_id"],
     15                 x["data"]["author"],

TypeError: byte indices must be integers or slices, not str

I attempted the solution here: How can I convert JSON to CSV?

Which may or may not be the actual problem I am running into. Any suggestions would be greatly appreciated!

It appears either x or x["data"] is not a dictionary but rather a string. Try printing out your values to debug and see what the actual structure of x is — sshashank124
– sshashank124, Commented May 17, 2018 at 13:32
Sorry, the page is a typo. When I run the correct code, it gives this error TypeError: string indices must be integers — dhrice
– dhrice, Commented May 17, 2018 at 13:40

Sasha Tsukanov · Accepted Answer · 2018-05-17 13:53:16Z

1

You have "data" with array of entries for your csv rows, not array of objects each with key "data". So you need to first access the "data":

page_json = json.loads(page.text)['data']

and then iterare over it:

for x in page_json:
    f.writerow([x["id"],
                x["parent_id"],
                x["author"],
                x["created_utc"],
                x["subreddit"],
                x["body"],
                x["score"]])

Notice that you need to iterate over the JSON object not the request.

You can also refactor the code to get this:

columns = ["id", "parent_id", "author", "created_utc", "subreddit", "body", "score"]
f.writerow(columns)
for x in page_json:
    f.writerow([x[column] for column in columns])

edited May 17, 2018 at 13:53

answered May 17, 2018 at 13:47

Sasha Tsukanov

1,1251 gold badge9 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python Converting JSON to CSV TypeError

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related