3

help(unicode) prints something like:

class unicode(basestring)
 |  unicode(string [, encoding[, errors]]) -> object
...

but you can use something different from a basestring as argument, you can do unicode(1) and get u'1'. What happens in that call? int don't have a __unicode__ method to be called.

3 Answers 3

2

If __unicode__ exists it is called, otherwise it falls back to __str__

class A(int):
    def __str__(self):
        print "A.str"
        return int.__str__(self)

    def __unicode__(self):
        print "A.unicode"
        return int.__str__(self)

class B(int):
    def __str__(self):
        print "B.str"
        return int.__str__(self)


unicode(A(1)) # prints "A.unicode"
unicode(B(1)) # prints "B.str"
Sign up to request clarification or add additional context in comments.

Comments

2

Same as unicode(str(1)).

>>> class thing(object):
...     def __str__(self):
...         print "__str__ called on " + repr(self)
...         return repr(self)
...
>>> a = thing()
>>> a
<__main__.thing object at 0x7f2f972795d0>
>>> unicode(a)
__str__ called on <__main__.thing object at 0x7f2f972795d0>
u'<__main__.thing object at 0x7f2f972795d0>'

If you really want to see the gritty bits underneath, open up the Python interpreter source code.

Objects/unicodeobject.c#PyUnicode_Type defines the unicode type, with constructor .tp_new=unicode_new.

Since the optional arguments encoding or errors are not given, and a unicode object is being constructed (as opposed to a unicode subclass), Objects/unicodeobject.c#unicode_new calls PyObject_Unicode.

Objects/object.c#PyObject_Unicode calls the __unicode__ method if it exists. If not, it falls back to PY_Type(v)->tp_str (a.k.a. __str__) or PY_Type(v)->tp_repr (a.k.a. __repr__). It then passes the result to PyUnicode_FromEncodedObject.

Objects/unicodeobject.c#PyUnicode_FromEncodedObject finds that it was given a string, and passes it on to PyUnicode_Decode, which returns a unicode object.

Finally, PyObject_Unicode returns to unicode_new, which returns this unicode object.

In short, unicode() will automatically stringify your object if it needs to. This is Python working as expected.

1 Comment

I mean... what happens internally.
1

If there is no __unicode__ method, the __str__ method will be called instead. Regardless of which of these methods is called, if a unicode is returned, it will be passed on as-is. If a str is returned, it will be decoded using the default encoding, as returned by sys.getdefaultencoding(), which should almost always be 'ascii'. If some other kind of object is returned, a TypeError will be raised.

(It is possible, by reloading the sys module, to change the default encoding by calling sys.setdefaultencoding(); this is basically always a bad idea.)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.