4

I am trying to alphabetically sort the words from a file. However, the program sorts the lines, not the words, according to their first words. Here it is.

fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
    lst2 = line.strip()
    words = lst2.split()
    lst.append(words)
    lst.sort()
print lst

Here is my input file

But soft what light through yonder window breaks 
It is the east and Juliet is the sun 
Arise fair sun and kill the envious moon 
Who is already sick and pale with grief

And this is what I'm hoping to get

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder'] 
8
  • Can you post some data and the expected output? Commented Oct 29, 2015 at 16:29
  • The file I am using is: (But soft what light through yonder window breaks It is the east and Juliet is the sun Arise fair sun and kill the envious moon Who is already sick and pale with grief) and I am expecting :['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder'] Commented Oct 29, 2015 at 16:31
  • 1
    All you need to do is change lst.append to lst.extend Commented Oct 29, 2015 at 16:32
  • @Umer try my result. It definitely works. Commented Oct 29, 2015 at 16:42
  • @Henry I tried your result, it doesn't change the result but only sorts the words within the lines. Commented Oct 29, 2015 at 16:46

4 Answers 4

7

lst.append(words) append a list at the end of lst, it does not concatenates lst and words. You need to use lst.extend(words) or lst += words.

Also, you should not sort the list at each iteration but only at the end of your loop:

lst = []
for line in fh:
    lst2 = line.strip()
    words = lst2.split()
    lst.extend(words)
lst.sort()
print lst

If you don't want repeated word, use a set:

st = set()
for line in fh:
    lst2 = line.strip()
    words = lst2.split()
    st.update(words)
lst = list(st)
lst.sort()
print lst
Sign up to request clarification or add additional context in comments.

1 Comment

extend doesn't change the result when I try it.
3

lst.append(words) is adding the list as a member to the outer list. For instance:

lst = []
lst.append(['another','list'])
lst ## [['another','list']]

So you're getting a nested list. Use .extend(...) instead:

fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
    lst2 = line.strip()
    words = lst2.split()
    lst.extend(words)
lst.sort()
print lst

1 Comment

I'm afraid it doesn't have any effect on the outcome.
2

line.split() returns a list of strings. Now you want to join those words with the list of strings you've already accumulated with the previous lines. When you call lst.append(words) you're just adding the list of words to your list, so you end up with a list of lists. What you probably want is extend() which simply adds all the elements of one list to the other.

So instead of doing lst.append(words), you would want lst.extend(words).

Comments

0

The problem is that words is an array of your words from the split. When you append words to lst, you are making a list of arrays, and sorting it will only sort that list.

You want to do something like:

for x in words:
  lst.append(x)
lst.sort()

I believe

Edit: I have implemented your text file, this following code works for me:

inp=open('test.txt','r')
lst=list()
for line in inp:
   tokens=line.split('\n')[0].split() #This is to split away new line characters but shouldnt impact
   for x in tokens:
     lst.append(x)
lst.sort()
lst

14 Comments

That's only going to sort the lists within the list, which does nothing.
Why? lst is a list of all the words as we have appended them on individually. I've just tested it myself and I'm correct. Please understand before you downvote people.
Agreed: the basic algorithm is correct. I didn't downvote, but I suspect it's because the for x in words bit is a slow way to write lst.extend(words).
That's not exactly clear in your answer. I would update to add more clarity.
It's trivially obvious in my answer. for x in words could not be more obvious to a wombat. It even uses the same variable words as OP and would plug straight into his code. @KirkStrauser Fair point, did not know of the existence of extend as I typically deal with arrays. Thanks for the pointer/
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.