0

I'm trying to write a function that returns an array of the elements of the longest length. I'm not looking for the longest element, but the longest element s.

The approach I've taken is to create a dictionary of arrays where the key is the length and the value is an array of elements of the length indicated by the key.

This is the code I've come up with

#initialise the dictionary
longest = {}
#this keeps track of the greatest length
longestNum = 0
for seq in proteinSeq:
    if len(seq) >= longestNum:
        longestNum = len(seq)
        #check to see if the dic key exists
        #if not initialise it
        try:
            longest[longestNum].append(seq)
        except NameError:
            longest[longestNum] = []
            longest[longestNum].append(seq)

return longest[longestNum]

It gives me a KeyError: 6 at the first longest[longestNum].append(seq)...

Can someone help me find what the problem here is?

3 Answers 3

2

If you try to read a key that doesn't exist, you get a KeyError, not a NameError, as your error message says. So you're catching the wrong exception.

You could use

except KeyError:

but I might use

longest.setdefault(longestNum, []).append(seq)

instead, or make longest a collections.defaultdict(list), in which case it would simply be

longest[longestNum].append(seq).

See this article for a quick comparison of defaultdict vs setdefault.

Sign up to request clarification or add additional context in comments.

1 Comment

Oh wow, that's cool... that really allows for some nice behaviour. I'm used to the over-powered arrays in PHP... so this is a nice find. (I can't accept your answer for another 5mins)
1

Change the NameError to KeyError, because if the key does not exist in your dictionary, a KeyError is raised, as you have seen in the traceback.

However, I'm not sure you need a dictionary in this case. What about something like:

longestwords=[]
longestlength=0

for word in all_words:

    if len(word) > longestlength:
         longestwords=[word,]
         longestlength=len(word)
    elif len(word) == longestlength:
         longestwords.append(word)

Comments

1

Here's a shorter and more declarative version, assuming I've understood your question properly. It also has the advantage of not constructing an entire dictionary only to subsequently discard all the key-value pairs corresponding to sequences shorter than those you are interested in.

>>> from itertools import takewhile
>>> # sort the protein sequences by length and then reverse the new
>>> # list so that the longest sequences come first.    
>>> longest_first = sorted(proteinSeq, key=len, reverse=True) 
>>> longestNum = len(longest_first[0])
>>> # take only those sequences whose length is equal to longestNum
>>> seqs = list(takewhile(lambda x: len(x)==longestNum, longest_first))

1 Comment

... except that sorting is O(n log(n)), while the algorithms above are O(n).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.