0

I have a list of article titles that I store in a text file and load into a list. I'm trying to compare the current title with all the titles that are in that list like so

def duplicate(entry):
    for line in posted_titles:
        print 'Comparing'
        print entry.title
        print line
        if line.lower() == entry.title.lower()
            print 'found duplicate'
            return True
    return False

My problem is, this never returns true. When it prints out identical strings for entry.title and line, it won't flag them as equal. Is there a string compare method or something I should be using?

Edit After looking at the representation of the strings, repr(line) the strings that are being compared look like this:

u"Some Article Title About Things And Stuff - Publisher Name"
'Some Article Title About Things And Stuff - Publisher Name'
9
  • The representations of the strings are...? Commented May 17, 2013 at 12:22
  • @IgnacioVazquez-Abrams see the edit at the bottom. Commented May 17, 2013 at 12:24
  • 1
    That's their display. What are their representations? Hint: repr() Commented May 17, 2013 at 12:26
  • 1
    Create a fully working example by constructing a list of two duplicate strings that the code does not detect, so that it is easier for people here to help you. Commented May 17, 2013 at 12:29
  • 1
    @Chris: Decode the bytestring with its encoding to turn it into a unicode. Commented May 17, 2013 at 12:42

2 Answers 2

1

It would help even more if you would have provided an actual example.

In any way, your problem is the different string encoding in Python 2. entry.title is apparently a unicode string (denoted by a u before the quotes), while line is a normal str (or vice-versa).

For all characters that are equally represented in both formats (ASCII characters and probably a few more), the equality comparison will be successful. For other characters it won’t:

>>> 'Ä' == u'Ä'
False

When doing the comparison in the reversed order, IDLE actually gives a warning here:

>>> u'Ä' == 'Ä'
Warning (from warnings module):
  File "__main__", line 1
UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
False

You can get a unicode string from a normal string by using str.decode and supplying the original encoding. For example latin1 in my IDLE:

>>> 'Ä'.decode('latin1')
u'\xc4'
>>> 'Ä'.decode('latin1') == u'Ä'
True

If you know it’s utf-8, you could also specify that. For example the following file saved with utf-8 will also print True:

# -*- coding: utf-8 -*-
print('Ä'.decode('utf-8') == u'Ä')
Sign up to request clarification or add additional context in comments.

Comments

0

== is fine for string comparison. Make sure you are dealing with strings

if str(line).lower() == str(entry.title).lower()

other possible syntax is boolean expression str1 is str2.

1 Comment

Never use is to compare unless you know you need to. And you don't.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.