2

I am kind of new to Python. I saw the following code and do not understand how a "list" can be used for sorting a string.

    lookup = defaultdict(list)
    ## Filling the lookup
    #  .....
    #  .....
    inputs = ['abc', 'acb', 'acb'] # a list of strings
    result = ''.join(sorted(inputs[0], key=lookup.get))

What I don't understand is the last line the key part. I know it does a lexicographical sort based on the values in the list. I appreciate it if someone can explain it or break this step down to a more readable solution.

for example, if the lookup table looks like this:

   {'a' : [-3, 0, 0], 'b': [0, -1, -2], 'c': [0, -2, -1]}

then the result will be this acb

3 Answers 3

7

The key argument to sorted means "Pretend the value is the result of this function instead of the actual value." So when you sort 'abc' with the lookup table you gave, it does this:

                # [1st, 2nd, 3rd] sort order
lookup.get('a') # [ -3,   0,   0]
lookup.get('b') # [  0,  -1,  -2]
lookup.get('c') # [  0,  -2,  -1]

Then it will figure out the sorted order of the above values. Lists are sorted lexicographically meaning the first element is compared first, just like in a dictionary ("aardvark" comes before "beaver" and also before "ant").

After looking at the first elements (-3, 0, 0) we know 'a' has the smallest value, but we don't know which of 'b' and 'c' is smaller. But as soon as we see the second elements (0, -1, -2), we know that 'c' is smaller, so the final order is 'acb' without ever consulting the third elements (0, -2, -1).

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the explanation. How does the "key" input in the "sorted" method knows it should go lexicographically? Is it taken care of internally by Python? Because key can be different data types such as integer, string, and nowhere is a list. And if it's a list then python automatically does the sort lexicographically? Very interesting and powerful. Thanks!
if you pass in a string , by default it will compare string, if you pass in an integer, it will use compare integer, this is taken care of internally by python, but you can as well define your you down. just add it to the compare logic data type implementation
@Alan The sorted function does not "know" to use lexicographical comparison, it just asks the elements to compare themselves. Lists compare themselves lexicographically (i.e. [1,2] < [3,4] does a lexicographical comparison).
The key function just converts from the original values in the container being sorted, into values to be compared. Lexicographic comparison occurs because the function returned a list, and lists compare lexicographically by their design.
1

so from your example, imagine you have the following

lookup = defaultdict(list)
lookup['a'] = [-3, 0, 0]
lookup['b'] = [0, -1, -2]
lookup['c'] = [0, -2, -1]
inputs = ['abc', 'acb', 'acb'] # a list of strings
# note that the key params of sort usually takes a function
result = ''.join(sorted(
            inputs[0], # this is the first value 'abc' of the input list 
            key=lookup.get # passing in lookup.get()
         ))

the sort function passing in each value of the string 'abc'

lookup.get(a) # first 
lookup.get(b) # next 
lookup.get(c) # next 

To understand the logic of comparison, it's internal for most data structure, you can implement yours for a custom class , __lt__ less than . __gt__ greater than

class my_int(int):
   def __lt__(a,b):
       return (a % b) % 2 != 0
   def __gt__(a,b):        
       return (a % b) % 2 == 0

9 Comments

Yes, I understand that part. The confusion (actually the surprise) was how does Python understand to do the lexicographical sort. Usually, when I do lexicographical sort in Python for the key I use different inputs separated by kama but here since the key will be a list (lookup.get() returns a list of lists) then python goes through the first elements of all returned lists, then the second element of all lists and so on.
no the first element of the list is integer datatype ... the lists are list of integers. If it was comparing the 'a', 'b' and 'c' then it should've returned 'abc' but it returns 'acb' because of the given values in the lists.
It's not lexicographical order of 'abc' it's lexicographical order based on the values of lists in the lookup table
so list comparison is internally just do [-3, 0, 0] > [0, -1, -2] like normal python operator
Thank Muyide for adding the description. What i was looking for is responded and detailed in the accepted solution.
|
1

Suppose you have a list of animals:

>>> animals=['aarvark','zebra','giraffe','bear','dog','cat','badger','ant']

Sorted lexicographically, or in alphabetical order, aardvark is sorted before ant and both before zebra:

>>> sorted(animals)
['aarvark', 'ant', 'badger', 'bear', 'cat', 'dog', 'giraffe', 'zebra']

Now suppose your 10 year old tells you I want all animals that start with 'b' sorted first, then 'z' then alphabetically.

With a key function, this is trivial to accomplish:

>>> lookup=['b','z']
>>> key_func=lambda s: (lookup.index(s[0]),s) if s[0] in lookup else (len(lookup),s)
>>> sorted(animals, key=key_func)
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']

Before the key function was added to Python sorting routines, the common approach to a problem like this was called Decorate, Sort, Undecorate and can be seen here:

>>> ts=sorted([(lookup.index(s[0]),s) if s[0] in lookup else (len(lookup), s) for s in animals])
>>> ts
[(0, 'badger'), (0, 'bear'), (1, 'zebra'), (2, 'aarvark'), (2, 'ant'), (2, 'cat'), (2, 'dog'), (2, 'giraffe')]
>>> [t[1] for t in ts]
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']

(BTW: This example is way easier and faster if you use a dict instead of a list:

>>> lookup={'b':0, 'z':1}
>>> sorted(animals, key=lambda s: (lookup.get(s[0], len(lookup)),s))
['badger', 'bear', 'zebra', 'aarvark', 'ant', 'cat', 'dog', 'giraffe']

That is the right way but your question involved list lookup...)


Key functions allow you to modify how the sort order is interpreted. For another example, consider if you wanted to sort by integers found in the sort strings and then alphabetically.

Here is the list:

>>> nl=['zebra65','ant101','bear5','no num', '2 num first', 's with 1 and 2']

If you just use the default, it comes out ASCIIbetically:

>>> sorted(nl)
['2 num first', 'ant101', 'bear5', 'no num', 's with 1 and 2', 'zebra65']

With a simple regex and key function, you can find all the numbers and form a tuple for sorting by number then the string:

import re

def find_n(s):
    ml=re.findall(r'(\d+)', s)
    if ml:
        return tuple(map(int, ml))+(s,)
    return (0,s)

>>> sorted(nl, key=find_n)
['no num', 's with 1 and 2', '2 num first', 'bear5', 'zebra65', 'ant101']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.