0

I am writing a python script for mass-replacement of links(actually image and script sources) in HTML files; I am using lxml. There is one problem, the html files are quizzes and they have data packaged like this(there is also some Cyrillic here):

<input class="question_data" value="{&quot;text&quot;:&quot;&lt;p&gt;[1] је наука која се бави чувањем, обрадом и преносом информација помоћу рачунара.&lt;/p&gt;&quot;,&quot;fields&quot;:[{&quot;id&quot;:&quot;1&quot;,&quot;type&quot;:&quot;fill&quot;,&quot;element&quot;:{&quot;sirina&quot;:&quot;103&quot;,&quot;maxDuzina&quot;:&quot;12&quot;,&quot;odgovor&quot;:[&quot;Информатика&quot;]}}]}" name="question:1:data" id="id3a1"/>

When I try to print out this data in python using:

print "OLD_DATA:", data

It just prints out the error "UnicodeEncodeError: character maps to undefined". There are more of these elements. My goal is to change the links of images in the value part of input, but I can't change the links if I don't know how to print this data(or how it should be written to the file). How does Python handle(interpret) this? Please help. Thanks!!! :)

2
  • Are you using Python 2.7? Commented Jan 6, 2016 at 19:08
  • Yes, Python 2.7.11 to be precise. Commented Jan 6, 2016 at 20:18

1 Answer 1

1

You're running into the same problem I've hit many times in the past. That error almost always means that the console environment you're using can't display the characters it's trying to print. It might be worth trying to log to a file instead, then opening the log in an editor that can display the characters.

If you really want to be able to see it on your console, it might be worth writing a function to screen the strings you're printing for unprintable characters

I also found a couple other StackOverflow posts that might be helpful in your efforts:

How do I get Cyrillic in the output, Python?

What is right way to use cyrillic in python lxml library

I would also recommend this article and python manual entry:

https://docs.python.org/2/howto/unicode.html

http://www.joelonsoftware.com/articles/Unicode.html

Sign up to request clarification or add additional context in comments.

2 Comments

My pleasure! Unicode errors are the bane of my existence, as I do a lot of batch processing.. I usually wrap statements that would cause them in Try/Catch code that logs the error so that my program doesn't dump in the middle of a 30 minute run simply because it couldn't log something :P
And if you really need to get Unicode out to Windows's sad broken command prompt: pypi.python.org/pypi/win_unicode_console

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.