0

I am trying to create a program that will read a text file and create a list of lines of words.

However am only able to append each line and not word, any help would be appreciated with this problem.

text = open("file.txt","r")

for line in text.readlines():
    sentence = line.strip()
    list.append(sentence)

    print list 
text.close()

Example text

I am here
to do something

and I wanted it to append it like this

[['I','am','here']['to','do','something']]

Thanks in advance.

2
  • 2
    Are the lines in a particular format? For example, are they space-delimited, or could they use all sorts of punctuation? Are there punctuation marks that need to be removed? Basically, clearing up what an individual word means is important for this problem. Commented Oct 16, 2011 at 23:57
  • 2
    you shouldn't use list as variable name. I think you mean text.close() instead of file.close() Commented Oct 17, 2011 at 0:38

6 Answers 6

1
>>> with open("file.txt","r") as f:
...     map(str.split, f)
... 
[['i', 'am', 'here'], ['to', 'do', 'something']]
Sign up to request clarification or add additional context in comments.

Comments

1

Each line in the example is just a string, so something like,

...
    PUNCTUATION = ',.?!"\''
    words = [w.strip(PUNCTUATION) for w in line.split() if w.strip(PUNCTUATION)]
    list.append(words)
...

would probably be okay to the first approximation although may not cover every edge case in the way that you want (i.e. hyphenated words, words not separated by whitespace, words that have a trailing apostrophe etc.)

The conditional is to avoid blank entries.

Comments

1

Where exactly are you getting the y variable?

In the most basic sense (because you have not quite specified what to do with punctuation) you can split each line into a list of words using line.split(' '), which splits on every space. If you have other delimiters you can substitute that in, instead of the space. Assign the above split to a var if need be and append it to your list.

@Brendan has provided a good solution to strip basic punctuation. Alternatively, you could also use a simple regex re.findall(r'\w+', file) to find all words in a given file.

Using yet another way, you can take advantage of pythons string library, and string.punctuation in particular:

str = list(line)
''.join([ word for word in str if not word in string.punctuation ]).split()

1 Comment

sorry, i fixed the y variable
1

Something like this would cover a large number of cases, and could be tailored to your used symbols:

import re
text = open("file.txt","r")

for line in text.readlines():
    sentence = line.strip()
    words = re.sub(" +"," ",re.sub("[^A-Za-z']"," ",sentence)).split()
    somelist.append(words)

    print list 
text.close()

This would only include the capital and lower case letters and apostrophes (for the sake of contractions)

Comments

0
text = open("file.txt","r")

word_groups = []

for line in text.readlines():
    words = line.strip().split(' ')
    word_groups.append(words)

print word_groups

text.close()

Comments

0

It looks like you're just missing a call to str.split(). Here's a simple a one-line list comprehension that does what you have asked for:

>>> [line.split() for line in open('file.txt')]
[['i', 'am', 'here'], ['to', 'do', 'something']]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.