0

I have this Python 3 script to read a json file and save as csv. It works fine except for the special characters like \u00e9. So Montr\u00e9al should be encoded like Montréal, but it is giving me Montréal instead.

import json

ifilename = 'business.json'
ofilename = 'business.csv'

json_lines = [json.loads( l.strip() ) for l in open(ifilename).readlines() ]
OUT_FILE = open(ofilename, "w", newline='', encoding='utf-8')
root = csv.writer(OUT_FILE)
root.writerow(["business_id","name","neighborhood","address","city","state"])
json_no = 0
for l in json_lines:
    root.writerow([l["business_id"],l["name"],l["neighborhood"],l["address"],l["city"],l["state"]])
    json_no += 1

print('Finished {0} lines'.format(json_no))
OUT_FILE.close()
1
  • The problem is not with the output of the program, the problem is with the editor you're using to display the file. It isn't recognizing UTF-8. Commented Aug 2, 2018 at 21:16

2 Answers 2

1

It turns out the csv file was displaying correctly when opening it with Notepad++ but not with Excel. So I had to import the csv file with Excel and specify 65001: Unicode (UTF-8). Thanks for the help.

Sign up to request clarification or add additional context in comments.

Comments

0

Try using this at the top of the file

# -*- coding: utf-8 -*-

Consider this example:

# -*- coding: utf-8 -*-    
import sys

print("my default encoding is : {0}".format(sys.getdefaultencoding()))
string_demo="Montréal"
print(string_demo)

reload(sys) # just in python2.x
sys.setdefaultencoding('UTF8') # just in python2.x

print("my default encoding is : {0}".format(sys.getdefaultencoding()))
print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))

In my case, the output is like this if i run in python2.x:

my default encoding is : ascii
Montréal
my default encoding is : UTF8
('Montr\xc3\xa9al', <type 'str'>)

but when i comment out the reload and setdefaultencoding lines, my output is like this:

my default encoding is : ascii
Montréal
my default encoding is : ascii
Traceback (most recent call last):
  File "test.py", line 12, in <module>
    print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

It's most a problem with the editor, Python when it's a encode error raise a Exception.

3 Comments

It is still giving me Montréal
which version of python are you using? as Mark says, it's most like a problem with your IDE
I'm using Anaconda - Spyder (Python 3.6)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.