Extract json fields and write them into a csv with python

Question

I've got a very big json with multiple fields and I want to extract just some of them and then write them into a csv.

Here is my code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import json

import csv

data_file = open("book_data.json", "r")
values = json.load(data_file)
data_file.close()

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         value = data["identifier"]
         value = data["authors"]
         for key, value in data.iteritems():
               wr.writerow([key, value])

It gives me this error:

File "json_to_csv.py", line 22, in <module>
wr.writerow([key, value])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 8: ordinal not in range(128)

But I give the utf-8 encoding on the top, so I don't know what's wrong there.

Thanks

File "json_to_csv.py", line 22, in <module> wr.writerow([key, value]) I'll add that. — Lara M.
– Lara M., Commented Jun 8, 2016 at 10:26

Padraic Cunningham · Accepted Answer · 2016-06-09 10:21:23Z

4

You need to encode the data:

wr.writerow([key.encode("utf-8"), value.encode("utf-8")])

The difference is equivalent to:

In [8]: print u'\u2019'.encode("utf-8")
’

In [9]: print str(u'\u2019')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-9-4e3ad09ee31b> in <module>()
----> 1 print str(u'\u2019')

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

If you have a mixture of strings and lists and values, you can use issinstance to check what you have, if you have a list iterate over and encode:

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         for key, value in data.iteritems():
               wr.writerow([key, ",".join([v.encode("utf-8") for v in value]) if isinstance(value, list) else value.encode("utf8")])

To just write the three columns creator, contributor and identifier, just pull the data using the keys:

import csv

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for dct in values:
        authors = dct["authors"]
        wr.writerow((",".join(authors["creator"]).encode("utf-8"),
                     "".join(authors["contributor"]).encode("utf-8"),
                     dct["identifier"].encode("utf-8")))

edited Jun 9, 2016 at 10:21

answered Jun 8, 2016 at 10:31

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

Sign up to request clarification or add additional context in comments.

17 Comments

Lara M. Over a year ago

Thanks! That works, bu now it gives me: File "json_to_csv.py", line 22, in <module> wr.writerow([key.encode("utf-8"), value.encode("utf-8")]) AttributeError: 'list' object has no attribute 'encode'

Padraic Cunningham Over a year ago

Do you have some values that are lists?

Padraic Cunningham Over a year ago

Yep, I am off my comp now but I will edit when I get back on.

Padraic Cunningham Over a year ago

Cool, then for multiple authors, contributors you will have a comma separated string which will be parsed as a single value when you read the file.

Lara M. Over a year ago

Thanks a multiple times!!

|

Collectives™ on Stack Overflow

Extract json fields and write them into a csv with python

1 Answer 1

17 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

17 Comments

Your Answer

Sign up or log in

Post as a guest

Related