Comparing strings not working

Question

I have a list of article titles that I store in a text file and load into a list. I'm trying to compare the current title with all the titles that are in that list like so

def duplicate(entry):
    for line in posted_titles:
        print 'Comparing'
        print entry.title
        print line
        if line.lower() == entry.title.lower()
            print 'found duplicate'
            return True
    return False

My problem is, this never returns true. When it prints out identical strings for entry.title and line, it won't flag them as equal. Is there a string compare method or something I should be using?

Edit After looking at the representation of the strings, repr(line) the strings that are being compared look like this:

u"Some Article Title About Things And Stuff - Publisher Name"
'Some Article Title About Things And Stuff - Publisher Name'

That's their display. What are their representations? Hint: repr() — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented May 17, 2013 at 12:26
Create a fully working example by constructing a list of two duplicate strings that the code does not detect, so that it is easier for people here to help you. — Lasse V. Karlsen
– Lasse V. Karlsen, Commented May 17, 2013 at 12:29
@Chris: Decode the bytestring with its encoding to turn it into a unicode. — Ignacio Vazquez-Abrams
– Ignacio Vazquez-Abrams, Commented May 17, 2013 at 12:42

poke · Accepted Answer · 2013-05-17 12:54:29Z

It would help even more if you would have provided an actual example.

In any way, your problem is the different string encoding in Python 2. entry.title is apparently a unicode string (denoted by a u before the quotes), while line is a normal str (or vice-versa).

For all characters that are equally represented in both formats (ASCII characters and probably a few more), the equality comparison will be successful. For other characters it won’t:

>>> 'Ä' == u'Ä'
False

When doing the comparison in the reversed order, IDLE actually gives a warning here:

>>> u'Ä' == 'Ä'
Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

You can get a unicode string from a normal string by using str.decode and supplying the original encoding. For example latin1 in my IDLE:

>>> 'Ä'.decode('latin1')
u'\xc4'
>>> 'Ä'.decode('latin1') == u'Ä'
True

If you know it’s utf-8, you could also specify that. For example the following file saved with utf-8 will also print True:

# -*- coding: utf-8 -*-
print('Ä'.decode('utf-8') == u'Ä')

kiriloff · Accepted Answer · 2013-05-17 12:52:48Z

0

== is fine for string comparison. Make sure you are dealing with strings

if str(line).lower() == str(entry.title).lower()

other possible syntax is boolean expression str1 is str2.

answered May 17, 2013 at 12:52

kiriloff

26.5k40 gold badges163 silver badges235 bronze badges

1 Comment

Ignacio Vazquez-Abrams Over a year ago

Never use is to compare unless you know you need to. And you don't.

Collectives™ on Stack Overflow

Comparing strings not working

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related