0

I have a program that parses webpages and then writes the data out somewhere else. When I am writing the data, I get

"UnicodeEncodeError: 'ascii' codec can't encode characters in position 19-21: ordinal not in range(128)"

I am gathering the data using lxml.

name = apiTree.xpath("//boardgames/boardgame/name[@primary='true']")[0].text
worksheet.goog["Name"].append(name)

Upon reading, http://effbot.org/pyfaq/what-does-unicodeerror-ascii-decoding-encoding-error-ordinal-not-in-range-128-mean.htm, it suggests I record all of my variables in unicode. This means I need to know what encoding the site is using.

My final line that actually writes the data out somewhere is:

wks.update_cell(row + 1, worksheet.goog[value + "_col"], (str(worksheet.goog[value][row])).encode('ascii', 'ignore'))

How would I incorporate using unicode assuming the encoding is UTF-8 on the way in and I want it to be ASCII on the way out?

0

2 Answers 2

1

You error is because of:

str(worksheet.goog[value][row]) 

Calling str you are trying to encode the ascii, what you should be doing is encoding to utf-8:

 worksheet.goog[value][row].encode("utf-8") 

As far as How would I incorporate using unicode assuming the encoding is UTF-8 on the way in and I want it to be ASCII on the way out? goes, you can't there is no ascii latin ă etc... unless you want to get the the closest ascii equivalent using something like Unidecode.

Sign up to request clarification or add additional context in comments.

Comments

0

I think I may have figured my own problem out.

apiTree.xpath("//boardgames/boardgame/name[@primary='true']")[0].text

Actually defaults to unicode. So what I did was change this line to:

name = (apiTree.xpath("//boardgames/boardgame/name[@primary='true']")[0].text).encode('ascii', errors='ignore')

And I just output without changing anything:

wks.update_cell(row + 1, worksheet.goog[value + "_col"], worksheet.goog[value][row])

Due to the nature of the data, ASCII only is mostly fine. Although, I may be able to use UTF-8 and catch some extra characters...but this is not relevant to the question.

:)

2 Comments

Why are you trying to encode to ascii in the first place?
It's just the name of a game in English. I don't need anything more than Ascii...but I suppose I could use UTF-8.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.