10

I wonder why when I make:

a = [u'k',u'ę',u'ą']

and then type:

'k' in a

I get True, while:

'ę' in a

will give me False?

It really gives me headache and it seems someone made this on purpose to make people mad...

5
  • 2
    For what it's worth, this behaves as you expect in Python 3. Commented Nov 14, 2013 at 0:44
  • On my Python (2.7.2), this raises the warning UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal before returning False, which is the reason for it. Using u'ę' in a works as expected. Commented Nov 14, 2013 at 0:45
  • 1
    @alKid, I just pasted it in my interpreter. Commented Nov 14, 2013 at 0:45
  • Does the interpreter handle unicode input? Commented Nov 14, 2013 at 0:46
  • I'm using python 2.7.15, 'ę' in a is True, which is strange... Commented Dec 24, 2018 at 4:14

4 Answers 4

15

And why is this?

In Python 2.x, you can't compare unicode to string directly for non-ascii characters. This will raise a warning:

Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal

However, in Python 3.x this doesn't appear, as all strings are unicode objects.

Solution?

You can either make the string unicode:

>>> u'ç' in a
True

Now, you're comparing both unicode objects, not unicode to string.

Or convert both to an encoding, for example utf-8 before comparing:

>>> c = u"ç"
>>> u'ç'.encode('utf-8') == c.encode('utf-8')
True

Also, to use non-ascii characters in your program, you'll have to specify the encoding, at the top of the file:

# -*- coding: utf-8 -*-

#the whole program

Hope this helps!

Sign up to request clarification or add additional context in comments.

Comments

4

You need to explicitly make the string unicode. The following shows an example, and the warning given when you do not specify it as unicode:

>>> a = [u'k',u'ę',u'ą']
>>> 'k' in a
True
>>> 'ę' in a
__main__:1: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False
>>> u'ę' in a
True

1 Comment

@KulawyKrul See my answer for that
1

u'ę' is a unicode object, while 'ę' is a str object in your current locale. Sometimes, depending on locale, they will be the same, and sometimes they will not.

One of the nice things about Python 3 is that all text is unicode, so this particular problem goes away.

1 Comment

Seems I need to start using Python 3 immediately! :) Thanks!
0

Make sure that you specify the source code encoding and use u in front of unicode literals.

This works both on Python 3 and Python 2:

#!/usr/bin/python
# -*- coding: utf-8 -*-

a = [u'k',u'ę',u'ą']

print(u'ę' in a)
# True

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.