0

I am trying to look at a .txt file and make a list of words in it. I want the words to be strings, but the ouput makes them lists.

import csv, math, os
os.chdir(r'C:\Users\jmela\canopy')
f=open("romeo.txt")

words = []

    for row in csv.reader(f):
        line = str(row)
        for word in line.split():
            if word not in words: 
                print word
                words.append(word)

    words.sort()
    print words

Does anyone know what I am doing wrong?

7
  • 3
    Why in the earth you convert your rows to string then split that? Commented Jul 12, 2015 at 13:52
  • This doesn't directly address your problem, but if you want a collection that has no duplicate values, consider using a set. Commented Jul 12, 2015 at 13:55
  • 1
    You are getting a list of strings, you probably are confusing it because some of them have [ in them. See @Kasra comment for why Commented Jul 12, 2015 at 13:58
  • how does your text file looks like? csv reader try to read rows and split columns based on delimiter. if your file is a list of words separated with comma, "row" will already be a list of words as strings. Commented Jul 12, 2015 at 14:00
  • When I try to do it directly: for row in csv.reader(f): for word in row.split(): if word not in words: print word words.append(word) I get this error: AttributeError: 'list' object has no attribute 'split' Commented Jul 12, 2015 at 14:00

3 Answers 3

1

based on your latest comment, doesn't look like you really need to use csv reader. just try this:

words = []
for line in open("romeo.txt", "r"):
    for word in line.split():
        if word not in words: 
            words.append(word)

words.sort()
print words

and like Kevin suggested, use set() instead of list.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that works perfectly. I don't follow what was wrong with my original code. Do you know why that didn't work?
yes like I said csv reader split every row into columns based on given delimiter (default to comma). so row was actually something like ["this is a sentence"] (list with one string which is the whole line, since there were no commas), and then you turned it into string (eg '["this is a sentence"]'), and then you tried to split it based on spaces... please read about csv reader some more, and next time you should debug and see what you get in every iteration of the loop, it will save you some time.. :)
I understand this & have learned from your explanation. Thank you.
0

Don't read the text file as csv then. Simply remove all punctuation and non-letter/non-space characters like this:

def replacePunct(string):
    alphabets = " abcdefghijklmnopqrstuvwxyz"
    for s in string:
        if s not in alphabets:
            string = string.replace(s, " ")
            replacePunct(string)
    string = string.split()
    string = [x for x in string if x != " "]
    return {set(string): len(string)}

1 Comment

Read the file as a normal text file and run this program for each line
0

You could use a set to hold your words. This would give you a unique word list. Any non-alpha characters and converted to spaces. The line is split into words and lowercased to make sure they match.

word_set = set()
re_nonalpha = re.compile('[^a-zA-Z ]+')

with open(r"romeo.txt", "r") as f_input:
    for line in f_input:
        line = re_nonalpha.sub(' ', line)  # Convert all non a-z to spaces

        for word in line.split():
            word_set.add(word.lower())

word_list = list(word_set)
word_list.sort()
print word_list

This would give you a list holding the following words:

['already', 'and', 'arise', 'breaks', 'but', 'east', 'envious', 'fair', 'grief', 'is', 'it', 'juliet', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'who', 'window', 'with', 'yonder']

Updated to also remove any punctuation.

1 Comment

Make sure to account for an extra space or hyphens

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.