7

I have a problem when I'm printing (or writing to a file) the non-ASCII characters in Python. I've resolved it by overriding the str method in my own objects, and making "x.encode('utf-8')" inside it, where x is a property inside the object.

But, if I receive a third-party object, and I make "str(object)", and this object has a non-ASCII character inside, it will fail.

So the question is: is there any way to tell the str method that the object has an UTF-8 codification, generically? I'm working with Python 2.5.4.

2
  • What does "receive a a third-party object" mean? What third-party object? And why can't this mysterious object be trusted to produce proper string values? Commented Nov 10, 2009 at 11:08
  • I'm interacting with other programs which are not made by me. Those programs can have objects with string properties which can contain non-ascii characters Commented Nov 10, 2009 at 11:27

5 Answers 5

10

There is no way to make str() work with Unicode in Python < 3.0.

Use repr(obj) instead of str(obj). repr() will convert the result to ASCII, properly escaping everything that isn't in the ASCII code range.

Other than that, use a file object which allows unicode. So don't encode at the input side but at the output side:

fileObj = codecs.open( "someFile", "w", "utf-8" )

Now you can write unicode strings to fileObj and they will be converted as needed. To make the same happen with print, you need to wrap sys.stdout:

import sys, codecs, locale
print str(sys.stdout.encoding)
sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
line = u"\u0411\n"
print type(line), len(line)
sys.stdout.write(line)
print line
Sign up to request clarification or add additional context in comments.

2 Comments

But I have the same problem when I use print(object), because internally it calls to str, so if the object has a non-ascii character it will fail. I've seen that I can put this in the first line of my files.py: # -- coding: utf-8 -- but it doesn't work
The encoding of the source file has nothing to do with what str() supports. str() only supports unicode characters in py3k, so either use repr() or unicode() everywhere.
5
none_ascii = '''
        ███╗   ███╗ ██████╗ ██╗   ██╗██╗███████╗███████╗ 
        ████╗ ████║██╔═══██╗██║   ██║██║██╔════╝██╔════╝ 
        ██╔████╔██║██║   ██║██║   ██║██║█████╗  ███████╗ 
        ██║╚██╔╝██║██║   ██║╚██╗ ██╔╝██║██╔══╝  ╚════██║ 
        ██║ ╚═╝ ██║╚██████╔╝ ╚████╔╝ ██║███████╗███████║ 
        ╚═╝     ╚═╝ ╚═════╝   ╚═══╝  ╚═╝╚══════╝╚══════╝ 
'''

print(none_ascii.decode('utf-8'))

Comments

3

How about you use unicode(object) and define __unicode__ method on your classes?

Then you know its unicode and you can encode it anyway you want into to a file.

5 Comments

But then I'm in the same problem: if I receive a third party object and I use "unicode(object)", and the object has a non-ascii character, it will fail, won't it?
Besides, when I use "print(object)", internally it calls str method, so I can't use unicode
One more question: if I use python 3, Won't I have those problems? Python3 makes the conversion alone? Does it accept non-ascii characters by default?
All Python 3 strings are (what used to be) unicode by default.
First, please realize, if you receive and array of bytes, witch python strings essetialy are, there is no way to be sure what encoding it is in. If there are third-party objects that give you strings in non-standard encoding, they should also provide which encoding it is in.
2

I would like to say that I've found a solution in Unix systems, exporting a environment var, with this:

export LC_CTYPE="es:ES.UTF-8"

This way, all files are in utf-8, so I can make prints or whatever and it works fine

1 Comment

What does this have to do with your question? Or with python?
0

just paste these two lines at the top of your code

  1. #!/usr/local/bin/python
  2. # coding: latin-1

go to this link for further details https://www.python.org/dev/peps/pep-0263/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.