1

So for a Programming assignment we have to re write the sort function in python to sort a list of words. So far I have made it able to sort the words based of the first letter of each and now im trying to run recursion to still sort it if the first letters or any of the letters are the same. Im having problems with the "IndexError: string index out of range" error. What I have so far is

def insertion_sort(bookwords):
    for index in range(1,len(bookwords)):
        global word
        word=bookwords[index]
        i=index-1
        word_checker(bookwords, 0, i)

def word_checker(bookwords, num, i):
    while i>=0:
        wordleft=bookwords[i]
        if ord(word[num])<ord(wordleft[num]):
            bookwords[i+1]=bookwords[i]
            bookwords[i]=word
            i=i-1
        elif ord(word[num])==ord(wordleft[num]):
            num=num+1
            word_checker(bookwords, num, i)
        else:
            break


bookwords=["michael", "maddy", "michelle", "monstor", "money", "mountain", "miniscus", "mega"]

insertion_sort(bookwords)

print bookwords

Im guessing num is becoming larger than the words but its running through many times without stopping when the letters arn't the same so im abit confused why its doing that. Any help would be greatly appreciated

UPDATE

Ok now it works but when I put it into the supplied code to test how fast it is with about 700000 words it went for 30+ until I stopped it where as the sort function took 5 seconds. Here is the code with my part as well

import re
import pygame

# 159.172 assignment 2
# 
def mysort(words):
for index in range(1,len(words)):
    word=words[index]
    i=index-1
    word_checker(words, i, word)

def word_checker(words, i, word):
while i>=0:
    wordleft=words[i]
    if word==wordleft:
        break
    elif word<wordleft:
        words[i+1]=words[i]
        words[i]=word
        i=i-1
    else:
        return

# Do NOT change anything below here:
#
# Compare two lists
def compare(l1,l2):
    if len(l1) != len(l2):
        return False
    for a,b in zip(l1,l2):
        if a!=b:
            return False
    return True

# Open the book
book=open("allsherlock.txt", "rt")

# Make a list of all the words in the book
bookwords=[]
for line in book:
    for word in re.findall(r'\w+', line):
        bookwords.append(word.lower())

print "Loaded",len(bookwords),"words"
sortedbookwords=bookwords[:]
pygame.init()
# Use the sort function to sort the words for testing
sortedbookwords.sort()
starttime=pygame.time.get_ticks()
# Call our sort function
mysort(bookwords)
print "Sort took",pygame.time.get_ticks()-starttime,"ms"
print "Correct sort:",compare(bookwords,sortedbookwords)
7
  • 1
    Try printing out the values of num, word, and wordleft before the if statement in your while loop Commented Oct 10, 2013 at 23:51
  • I don't think you need to use ord; for example, "foo" < "bar" returns False. Commented Oct 10, 2013 at 23:52
  • As a side note, you don't need to do ord(ch1) < ord(ch2). This does exactly the same thing as ch1 < ch2 would have. (That's only true with str in Python 2.x/bytes in 3.x, not unicode in 2.x/str in 3.x. But in the latter case, ord isn't probably wrong.) Commented Oct 10, 2013 at 23:53
  • Try resetting num to 0 when you decrease i. And remove the global variable word, initialise it in the word_checker instead. Commented Oct 10, 2013 at 23:54
  • Meanwhile, why are you trying to compare character-by-character anyway? If you're sorting words, just compare the words. What you're doing is to compare character #2 of one word with character #2 of another word… then, if it fails, compare character #3 of the first word with character #3 of a different word. Commented Oct 10, 2013 at 23:55

2 Answers 2

3

you have to change this:

 elif ord(word[num])==ord(wordleft[num]):
     num=num+1
     word_checker(bookwords, num, i)
 else:

to:

 elif ord(word[num])==ord(wordleft[num]):
     num=num+1
 else:

then it will print: ['maddy', 'mega', 'money', 'michael', 'michelle', 'miniscus', 'monstor', 'mountain']

i don't see the point of doing recursion there anyway, i think insertion sort doesn't do recursion.

update

The algorithm was broken when comparing by character, but python can compare the strings for you, so this will give the right result:

def insertion_sort(bookwords):
    for index in range(1,len(bookwords)):
        global word
        word=bookwords[index]
        i=index-1
        word_checker(bookwords, i)

def word_checker(bookwords,  i):
    while i>=0:
        wordleft=bookwords[i]
        if word<wordleft:
            bookwords[i+1]=bookwords[i]
            bookwords[i]=word
        i=i-1

bookwords=["michael", "maddy", "michelle", "monstor", "money", "mountain", "miniscus", "mega"]
insertion_sort(bookwords)
print bookwords #prints ['maddy', 'mega', 'michael', 'michelle', 'miniscus', 'money', 'monstor', 'mountain']
Sign up to request clarification or add additional context in comments.

6 Comments

you are right, there must be something else broken on the algorithm, i think recursion is not needed there, let me check
do you have to compare by char?
the part where you compare by characters was broken, i update the algorithm to compare the whole string ( unless you want to do it by character for learning or any other reason) i think the reason that it was broken is because of string length. i assume since python will do the comparison theres no point of doing it
thnaks man it works now the last question I have is the assignment is testing how long it takes to sort the words (about 700000 of them). The inbuilt sort function takes about 5 seconds but this was going for 30 minutes and I stopped it then to have a look. Should this method be that slow or could there be a problem when putting my function into the supplied code to test it against sort function?
if i am right python sort is onlogn and insertion sort is on**2 theres a performance comparison here. this is of course average case and it depends on your input. with insertion sort if you have a word starting with aa in the last position of the list, you will have to compare it to all 700000 which is not very good.
|
1

Several things:

  • Python zero-indexes strings (so from 0 to len(string)-1). and
  • Consider just using "for" to go through each letter.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.