Sort strings in Python list using another list

Question

Say I have the following lists:

List1=['Name1','Name3','Color1','Size2','Color3','Color2','Name2','Size1', 'ID']
List2=['ID','Color1','Color2','Size1','Size2','Name1','Name2']

Each list will have element named "ID" variable and then 3 other categories (Name, Color, and Size) of which there is an unpredetermined number of elements in each category.

I want to sort these variables without knowing how many there will be in each category with the following 'sort list':

SortList=['ID','Name','Size','Color']

I can get the desired output (see below) although I imagine there is a better / more pythonic way of doing so.

>>> def SortMyList(MyList,SortList):       
...     SortedList=[]       
...     for SortItem in SortList:
...         SortItemList=[]
...         for Item in MyList:
...             ItemWithoutNum="".join([char for char in Item if char.isalpha()])  
...             if SortItem==ItemWithoutNum:
...                 SortItemList.append(Item)
...         if len(SortItemList)>1:
...             SortItemList=[SortItem+str(I) for I in range(1,len(SortItemList)+1)]
...         for SortedItem in SortItemList:
...             SortedList.append(SortedItem)
...     return SortedList
... 
>>> 
>>> SortMyList(List1, SortList)
['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
>>> SortMyList(List2, SortList)
['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
>>>

Any suggestions as to how my methodology or my code can be improved?

I'm a bit unclear here ... The do "Name" category items all start with the substring "Name", etc? — mgilson
– mgilson, Commented Jan 19, 2016 at 19:57
Could you have entries as high as Name11 which you would want after Name10 and before Name12? — DSM
– DSM, Commented Jan 19, 2016 at 20:01
This may belong over on code review - my initial suggestion would be to fix your casing - StudlyCase words are for class definitions, not functions or variables. pep8 and pyflakes are two linters that will help point out at least the style problems in your code. Though you may want to take their advice with a grain of salt — Wayne Werner
– Wayne Werner, Commented Jan 19, 2016 at 20:05
@ mgilson - yes all items within a category will start with the same substring. And yes, @DSM, entries can be more than 1 digit and if so Name11 should go between Name10 and Name12 — AJG519
– AJG519, Commented Jan 19, 2016 at 20:08

shx2 · Accepted Answer · 2016-01-19 20:07:47Z

5

You can sort the list using a custom key function, which returns a 2-tuple, for primary sorting and secondary sorting.

Primary sorting is by the order of your "tags" (ID first, then Name, etc.). Secondary sorting is by the numeric value following it.

tags = ['ID','Name','Size','Color']
sort_order = { tag : i for i,tag in enumerate(tags) }

def elem_key(x):
    for tag in tags:
        if x.startswith(tag):
            suffix = x[len(tag) : ]
            return ( sort_order[tag],
                     int(suffix) if suffix else None )
    raise ValueError("element %s is not prefixed by a known tag. order is not defined" % x)

list1.sort(key = elem_key)

edited Jan 19, 2016 at 20:07

answered Jan 19, 2016 at 20:03

shx2

64.8k17 gold badges139 silver badges166 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

8one6 Over a year ago

Can you explain the assert 0 line?

shx2 Over a year ago

@8one6, sure. see more details in the assert line.

8one6 Over a year ago

I.e. it makes sure that this custom key function actively blows chunks (rather than failing passively) if given an unexpected input?

shx2 Over a year ago

It would have failed anyway (trying to compare tuples to something else the key function would have returned, such as None), but this way the error condition is reported explicitly.

Garrett R · Accepted Answer · 2016-01-20 00:44:57Z

2

This works as long as you know that List2 only contains strings that starts with things in sortList

List2=['ID','Color4','Color2','Size1','Size2','Name2','Name1']
sortList=['ID','Name','Size','Color']
def sort_fun(x):
    for i, thing in enumerate(sortList):
        if x.startswith(thing):
            return (i, x[len(thing):])

print sorted(List2, key=sort_fun)

answered Jan 20, 2016 at 0:44

Garrett R

2,66213 silver badges15 bronze badges

Comments

B. M. · Accepted Answer · 2016-01-19 21:19:40Z

1

You can just provide the adequate key :

List1.sort( key = lambda x : ('INSC'.index(x[0]),x[-1]))
# ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

The elements will be sorted by the first letter then the last digit if exists. It works here because all first letters are different and if numbers have at most one digit.

EDIT

for many digits, a more obfuscated solution:

List1.sort( key =lambda x : ('INSC'.index(x[0]),int("0"+"".join(re.findall('\d+',x)))))
 # ['ID', 'Name1', 'Name2', 'Name10', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

edited Jan 19, 2016 at 21:19

answered Jan 19, 2016 at 20:21

B. M.

18.7k2 gold badges40 silver badges56 bronze badges

1 Comment

user2683246 Over a year ago

how about ['Color10', 'Color2']. The result may be wrong for numbers gt 9

GingerPlusPlus · Accepted Answer · 2016-01-20 13:30:41Z

Is there (in this case) easier way to extract data from string than simple regexes?

import re

def keygen(sort_list):
    return lambda elem: (
        sort_list.index(re.findall(r'^[a-zA-Z]+', elem)[0]),
        re.findall(r'\d+$', elem)
    )

Usage:

   SortList = ['ID', 'Name', 'Size', 'Color']
   List1 = ['Name1', 'Name3', 'Color1', 'Size2', 'Color3', 'Color2','Name2', 'Size1', 'ID']
   List2 = ['ID', 'Color1', 'Color2', 'Size1', 'Size2', 'Name1', 'Name2']
   sorted(List1, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
   sorted(List2, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']

Explanation:

^[a-zA-Z]+ matches alphabetic part at the beggining, and \d$ – numeric part at the end of string.

keygen returns lambda that takes a string, and returns two-item tuple:
first item is position of alphabetic part in the list (no such item in list = ValueError),
second is one-item list containing numeric part at the end, or empty list if string doesn't end with digit.

Some possible improvements:

sort_list.index call is O(n), and it will be called for each element in list; can be replaced with O(1) dict lookup to speed sorting up (I didn't do that to keep things simple),
numeric part can be convered into actual integers (1 < 2 < 10, but '1' < '10' < '2')

After applying those:

import re

def keygen(sort_list):
    index = {(word, index) for index, word in enumerate(sort_slist)}
    return lambda elem: (
        index[re.findall(r'^[a-zA-Z]+', elem)[0]],
        [int(s) for s in re.findall(r'\d+$', elem)]
    )

Collectives™ on Stack Overflow

Sort strings in Python list using another list

4 Answers 4

4 Comments

Comments

1 Comment

Usage:

Explanation:

Some possible improvements:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

Comments

1 Comment

Usage:

Explanation:

Some possible improvements:

Comments

Your Answer

Sign up or log in

Post as a guest

Related