3

I've got a very big json with multiple fields and I want to extract just some of them and then write them into a csv.

Here is my code:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import json

import csv

data_file = open("book_data.json", "r")
values = json.load(data_file)
data_file.close()

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         value = data["identifier"]
         value = data["authors"]
         for key, value in data.iteritems():
               wr.writerow([key, value])

It gives me this error:

File "json_to_csv.py", line 22, in <module>
wr.writerow([key, value])
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 8: ordinal not in range(128)

But I give the utf-8 encoding on the top, so I don't know what's wrong there.

Thanks

3
  • At which line is the error ? Commented Jun 8, 2016 at 10:20
  • 1
    File "json_to_csv.py", line 22, in <module> wr.writerow([key, value]) I'll add that. Commented Jun 8, 2016 at 10:26
  • 1
    try github.com/jdunck/python-unicodecsv Commented Jun 8, 2016 at 10:29

1 Answer 1

4

You need to encode the data:

wr.writerow([key.encode("utf-8"), value.encode("utf-8")])

The difference is equivalent to:

In [8]: print u'\u2019'.encode("utf-8")
’

In [9]: print str(u'\u2019')
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-9-4e3ad09ee31b> in <module>()
----> 1 print str(u'\u2019')

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 0: ordinal not in range(128)

If you have a mixture of strings and lists and values, you can use issinstance to check what you have, if you have a list iterate over and encode:

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for data in values:
         for key, value in data.iteritems():
               wr.writerow([key, ",".join([v.encode("utf-8") for v in value]) if isinstance(value, list) else value.encode("utf8")])

To just write the three columns creator, contributor and identifier, just pull the data using the keys:

import csv

with open("book_data.csv", "wb") as f:
    wr = csv.writer(f)
    for dct in values:
        authors = dct["authors"]
        wr.writerow((",".join(authors["creator"]).encode("utf-8"),
                     "".join(authors["contributor"]).encode("utf-8"),
                     dct["identifier"].encode("utf-8")))
Sign up to request clarification or add additional context in comments.

17 Comments

Thanks! That works, bu now it gives me: File "json_to_csv.py", line 22, in <module> wr.writerow([key.encode("utf-8"), value.encode("utf-8")]) AttributeError: 'list' object has no attribute 'encode'
Do you have some values that are lists?
Yep, I am off my comp now but I will edit when I get back on.
Cool, then for multiple authors, contributors you will have a comma separated string which will be parsed as a single value when you read the file.
Thanks a multiple times!!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.