0

I have a file with a list of words and I am trying to look for a word reading line by line. A sample of common_words file would be:

yourself
yourselves
z
zero

The list is lexicographically sorted.

def isCommonWord(word):

    commonWordList = open("common_words", 'r')
    commonWord = commonWordList.readline()
    commonWord = commonWord.rstrip("\n")

    while commonWord <= word:
        if commonWord == word:
            return True 
        commonWord =  commonWordList.readline()
        commonWord = commonWord.rstrip("\n")

    return False

if isCommonWord("zeros"):
    print "true"
else:
    print "false"

Now this function is getting into an infinite loop. I have no idea how this is happening. Any help will be greatly appreciated. If I try other variables besides "zeros" then it works perfectly fine. Only with the "zeros" I am facing trouble. Thank you for your time.

3
  • You seem to continue doing readline even after reaching the end of file. Commented Apr 20, 2012 at 9:43
  • what do you want to achieve with this fuction? Commented Apr 20, 2012 at 9:44
  • @AshwiniChaudhary I have a list of words in a file and I need to check if any given word is among the wordlist in the file. Commented Apr 20, 2012 at 9:48

6 Answers 6

3

The problem is that zeros would come after the last word in your file -- but you don't check for this. Moreover, readline() will just give you an empty string if you have reached the end of the file, so the loop just keeps thinking "not there yet" and going forever.

By the way, there are better ways of doing this, using the fact that the list is sorted: have a look at binary search.

In fact, you can do even better than that if you have lots of memory to spare: just read the entire file into a large set and then it takes constant time to check for membership!

Sign up to request clarification or add additional context in comments.

2 Comments

The list can be extremely large for memory(theoretically), so I cannot assume always that there will be enough memory, but of course as you said some manner of binary search can be implemented. I plan to do so, just that its at initial stage.
Fair enough. (You can also use something like shelve to hold a large set.)
2

readline will return the empty string when you try to read past the end of the file, and the empty string compares '' any word, so your loop condition is always true if the word you're looking for is > any of the words in the file.

This can be fixed by rewriting the loop as

def isCommonWord(word):
    with open("common_words") as f:
        for w in f:
            w = w.rstrip()
            if w == word:
                return True
            elif w > word:
                break

    return False

Though the real solution to the problem is to read the file once and build a set out of it:

common = set(ln.rstrip() for ln in open("common_words"))
print("true" if "zeros" in common else "false")

3 Comments

But the function works fine for all cases except the case for "zeros". Both for words in the list and not in the list.
@QuaziFarhan: that's because "zeros" compares > any word in the file. Try zzz for a change, that'll go into an infinite loop as well.
it work for words not in the list which are lexicographically before the last one.
1

for "yourself"<="zeros" the condition is true and while loop will continue infinitely.

so if are passing any word to that function which is lexicographically larger than the other words then your program will run into a infinite loop. for eg. for "zz" "yourself"<="zz" will run into an infinite loop, as zz is lexicographically larger than all the other words in the file common_words.

A better version of isCommonword() will be:

def isCommonWord(word):

    commonWordList = open("common_words.txt")
    commonWord = [x.rstrip() for x in commonWordList]
    if word in commonWord:
        return True
    else:return False

2 Comments

simply return word in commonWord instead of the if (True): return true else: return false
there's nothing wrong in your code, my comment was just a suggestion to make it less verbose (and imho clearer)
1

Most probably, "zeros" is behind all words in your file common_words, so that there is no match. commonWord (which you read with <fobj>.readline()) will be empty ("") when hitting EOF of your input file, and an empty string (which is returned "forever") is smaller than "zeros", so that your loop condition will never terminate.

Change the loop condition to:

while commonWord and commonWord <= word:
    ...

Comments

0

You haven't added a way for the loop to exit if the word is not found and is lexographically after the last word in the file. "zero" is in the file, but not "zeros"

A fairly direct translation of your while loop that will work might be

for commonWord in commonWordList:
    commonWord = commonWord.rstrip("\n")
    if commonWord <= word:
        break
    elif commonWord == word:
        return True 
return False

The for loop automatically terminates when the end of the file is reached

1 Comment

Not quite -- if the word is missing from the middle of the file then the loop will terminate.
0

The problem might be with your condition commonWord <= word. Try using != and check that readline is returning something. If the word is in the list, it returns true, if it isn't nothing is breaking the loop :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.