0

I want to open the a file and read it line by line. For each line I want split the line into a list of words using the split() method. Then I want check each word on each line to see if the word is already in the list and if not append it to the list. This is the code that I have written.

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = list()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        if stuff not in stuff:
            line1.append(stuff)
print line1

My problem is that when I print out line1 it prints out about 30 duplicate lists in a format like this.

['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], 
['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun'], 
    ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'], 
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'],

I want to know why that problem is happening and how to delete the duplicate words and lists.

2
  • 1
    Not sure what you're trying to do exactly, but I have a feeling that if stuff not in stuff is hurting you at least a little Commented Mar 14, 2016 at 18:04
  • 1
    Your condition is if stuff not in stuff:. I think you mean if word not in list1:? If that is not the case, could you explain more clearly what you want happening? Commented Mar 14, 2016 at 18:04

1 Answer 1

2

You have if stuff not in stuff. If you change that line to if word not in line1: and the next line to line1.append(word) your code should work.

Alternatively, use sets.

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        line1.add(word)
print line1

or even

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    line1 = line1.union(set(stuff))
print line1

Sets will only contain unique values (although they have no concept of ordering or indexing), so you would not need to deal with checking whether a word has come up already: the set data type takes care of that automatically.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.