How to sort contents in a file in python

Question

I'm trying to figure out a simple way to sort words from a file, however the spaces "\n" are always returned when I print the words. How could I improve this code to make it work properly? I'm using python 2.7 Thanks in advance.

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    for word in file_handle:
        word = word.split()
        print sorted(file_handle)
    file_handle.close()

abarnert · Accepted Answer · 2013-12-19 21:37:29Z

You actually have two problems here.

The big one is that print sorted(file_handle) reads and sorts the whole rest of the file and prints that out. You're doing that once per line. So, what happens is that you read the first line, split it, ignore the result, sort and print all the lines after the first, and then you're done.

What you want to do is accumulate all the words as you go along, then sort and print that. Like this:

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    words = []
    for line in file_handle:
        words += line.split()
    file_handle.close()
    print sorted(words)

Or, if you want to print the sorted list one line at a time, instead of as a giant list, change the last line to:

print '\n'.sorted(words)

For the second, more minor problem, the one you asked about, you just need to strip off the newlines. So, change the words += line to this:

words += line.strip().split()

However, if you had solved the first problem, you wouldn't even have noticed this one. If you have a line like "one two three\n", and you call split() on it, you will get back ["one", "two", "three"], with no \n to worry about. So, you don't actually even need to solve this one.

While we're at it, there are a few other improvements you could make here:

Use a with statement to close the file instead of doing it manually.
Make this function return the list of words (so you can do various different things with it, instead of just printing it and returning nothing).
Take the filename as a parameter instead of hardcoding it (for similar flexibility).
Maybe turn the loop into a comprehension—but that would require an extra "flattening" step, so I'm not sure it's worth it.
If you don't want duplicate words, use a set rather than a list.
Depending on the use case, you often want to use rstrip() or rstrip('\n') to remove just the trailing newline, while leaving, say, paragraph indentation tabs or spaces. If you're looking for individual words, however, you probably don't want that.
You might want to filter out and/or split on non-alphabetical characters, so you don't get "that." as a word. Doing even this basic kind of natural-language processing is non-trivial, so I won't show an example here. (For example, you probably want "John's" to be a word, you may or may not want "jack-o-lantern" to be one word instead of three; you almost certainly don't want "two-three" to be one word…)
The self parameter is only needed in methods of classes. This doesn't appear to be in any class. (If it is, it's not doing anything with self, so there's no visible reason for it to be in a class. You might have some reason which would be visible in your larger program, of course.)

So, anyway:

def sorting(filename):
    words = []
    with open(filename) as file_handle:
        for line in file_handle:
            words += line.split()
    return sorted(words)

print '\n'.join(sorting('food.txt'))

Jochen Ritzel · Accepted Answer · 2013-12-19 21:24:32Z

2

Basically all you have to do is strip that newline (and all other whitespace because you probably don't want it):

def sorting(self):
    filename = ("food.txt")
    file_handle = open(filename, "r")
    for line in file_handle:
        word = line.strip().split()
        print sorted(file_handle)
    file_handle.close()

Otherwise you can just remove the last character with line[:-1].split()

answered Dec 19, 2013 at 21:24

Jochen Ritzel

108k33 gold badges205 silver badges196 bronze badges

6 Comments

treddy Over a year ago

It's probably a bit more Pythonic to use the context manager with statement to handle the file as well.

abarnert Over a year ago

Depending on the use case, you often want to use rstrip() or rstrip('\n') to remove just the trailing newline, while leaving, say, paragraph indentation tabs or spaces. But it sounds like in the OP's use case there's no reason for that, and this is fine.

wombatp Over a year ago

I've tried all those possible changes, but it still returns me the spaces.. ['\n', 'five\n', 'four\n', 'one\n', 'three\n', 'two\n']

abarnert Over a year ago

@user205820: Your function doesn't return anything (except the default None that a function returns if it doesn't return anything else), so it can't be returning that. It prints that, for the reason I explained in my answer. You have two problems, and this answer only solves the one you asked about, not the other (more serious) one you didn't.

abarnert Over a year ago

@user205820: And actually, if you've solved the other problem, you wouldn't need to solve this one, while if you solve this one, that still won't help until you solve the other one too. So really, you asked the wrong question, which is why Jochen's correct answer to your question doesn't actually help you.

|

Cory Kramer · Accepted Answer · 2013-12-19 21:24:18Z

0

Use .strip(). It will remove white space by default. You can also add other characters (like "\n") to strip as well. This will leave just the words.

answered Dec 19, 2013 at 21:24

Cory Kramer

119k19 gold badges176 silver badges233 bronze badges

1 Comment

abarnert Over a year ago

"\n" is already included in the default whitespace used by strip(); you don't need to add it. (And if you do add it, you need to look up all the other default characters and add those in too, because you're no longer getting the defaults anymore.)

Artsiom Rudzenka · Accepted Answer · 2013-12-19 21:27:00Z

0

Try this:

def sorting(self):
    words = []
    with open("food.txt") as f:
        for line in f:
            words.extend(line.split())
    return sorted(words, key=lambda word: word.lower())

answered Dec 19, 2013 at 21:27

Artsiom Rudzenka

29.3k5 gold badges36 silver badges53 bronze badges

Comments

Federico Ponte · Accepted Answer · 2013-12-19 21:27:59Z

-1

To avoid printing the new lines just put , in the end:

print sorted(file_handle),

In your code, i don't see that you are sorting the whole file, just the line. Use a list to save all the words, and after you read the file, sort them all.

answered Dec 19, 2013 at 21:27

Federico Ponte

911 silver badge3 bronze badges

Collectives™ on Stack Overflow

How to sort contents in a file in python

5 Answers 5

Comments

6 Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

6 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related