0

I am attempting to write a CSV file from JSON data that is coming from the PushShift API but am running into a TypeError. My code is below

import requests
import csv
import json
from urllib.request import urlopen

url = 'https://api.pushshift.io/reddit/comment/search/?subreddit=science&filter=parent_id,id,author,created_utc,subreddit,body,score,permalink'
page = requests.get(url)
page_json = json.loads(page.text)
print(page.text)
f = csv.writer(open("test.csv",'w+', newline=''))
f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
for x in page_json:
f.writerow([x["data"]["id"],
            x["data"]["parent_id"],
            x["data"]["author"],
            x["data"]["created_utc"],
            x["data"]["subreddit"],
            x["data"]["body"],
            x["data"]["score"]])

The error I am getting is this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-82784a93576b> in <module>()
 11 f.writerow(["id", "parent_id", "author", "created_utc","subreddit", "body", "score"])
 12 for x in page:
---> 13     f.writerow([x["data"]["id"],
     14                 x["data"]["parent_id"],
     15                 x["data"]["author"],

TypeError: byte indices must be integers or slices, not str

I attempted the solution here: How can I convert JSON to CSV?

Which may or may not be the actual problem I am running into. Any suggestions would be greatly appreciated!

6
  • It appears either x or x["data"] is not a dictionary but rather a string. Try printing out your values to debug and see what the actual structure of x is Commented May 17, 2018 at 13:32
  • 1
    You're supposed to be using page_json, not page Commented May 17, 2018 at 13:32
  • you are iterating on page and not on page_json . Commented May 17, 2018 at 13:32
  • for x in page_json:.... Commented May 17, 2018 at 13:32
  • Sorry, the page is a typo. When I run the correct code, it gives this error TypeError: string indices must be integers Commented May 17, 2018 at 13:40

1 Answer 1

1

You have "data" with array of entries for your csv rows, not array of objects each with key "data". So you need to first access the "data":

page_json = json.loads(page.text)['data']

and then iterare over it:

for x in page_json:
    f.writerow([x["id"],
                x["parent_id"],
                x["author"],
                x["created_utc"],
                x["subreddit"],
                x["body"],
                x["score"]])

Notice that you need to iterate over the JSON object not the request.

You can also refactor the code to get this:

columns = ["id", "parent_id", "author", "created_utc", "subreddit", "body", "score"]
f.writerow(columns)
for x in page_json:
    f.writerow([x[column] for column in columns])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.