6

Say I have the following lists:

List1=['Name1','Name3','Color1','Size2','Color3','Color2','Name2','Size1', 'ID']
List2=['ID','Color1','Color2','Size1','Size2','Name1','Name2']

Each list will have element named "ID" variable and then 3 other categories (Name, Color, and Size) of which there is an unpredetermined number of elements in each category.

I want to sort these variables without knowing how many there will be in each category with the following 'sort list':

SortList=['ID','Name','Size','Color']

I can get the desired output (see below) although I imagine there is a better / more pythonic way of doing so.

>>> def SortMyList(MyList,SortList):       
...     SortedList=[]       
...     for SortItem in SortList:
...         SortItemList=[]
...         for Item in MyList:
...             ItemWithoutNum="".join([char for char in Item if char.isalpha()])  
...             if SortItem==ItemWithoutNum:
...                 SortItemList.append(Item)
...         if len(SortItemList)>1:
...             SortItemList=[SortItem+str(I) for I in range(1,len(SortItemList)+1)]
...         for SortedItem in SortItemList:
...             SortedList.append(SortedItem)
...     return SortedList
... 
>>> 
>>> SortMyList(List1, SortList)
['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
>>> SortMyList(List2, SortList)
['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']
>>> 

Any suggestions as to how my methodology or my code can be improved?

4
  • I'm a bit unclear here ... The do "Name" category items all start with the substring "Name", etc? Commented Jan 19, 2016 at 19:57
  • Could you have entries as high as Name11 which you would want after Name10 and before Name12? Commented Jan 19, 2016 at 20:01
  • This may belong over on code review - my initial suggestion would be to fix your casing - StudlyCase words are for class definitions, not functions or variables. pep8 and pyflakes are two linters that will help point out at least the style problems in your code. Though you may want to take their advice with a grain of salt Commented Jan 19, 2016 at 20:05
  • @ mgilson - yes all items within a category will start with the same substring. And yes, @DSM, entries can be more than 1 digit and if so Name11 should go between Name10 and Name12 Commented Jan 19, 2016 at 20:08

4 Answers 4

5

You can sort the list using a custom key function, which returns a 2-tuple, for primary sorting and secondary sorting.

Primary sorting is by the order of your "tags" (ID first, then Name, etc.). Secondary sorting is by the numeric value following it.

tags = ['ID','Name','Size','Color']
sort_order = { tag : i for i,tag in enumerate(tags) }

def elem_key(x):
    for tag in tags:
        if x.startswith(tag):
            suffix = x[len(tag) : ]
            return ( sort_order[tag],
                     int(suffix) if suffix else None )
    raise ValueError("element %s is not prefixed by a known tag. order is not defined" % x)

list1.sort(key = elem_key)
Sign up to request clarification or add additional context in comments.

4 Comments

Can you explain the assert 0 line?
@8one6, sure. see more details in the assert line.
I.e. it makes sure that this custom key function actively blows chunks (rather than failing passively) if given an unexpected input?
It would have failed anyway (trying to compare tuples to something else the key function would have returned, such as None), but this way the error condition is reported explicitly.
2

This works as long as you know that List2 only contains strings that starts with things in sortList

List2=['ID','Color4','Color2','Size1','Size2','Name2','Name1']
sortList=['ID','Name','Size','Color']
def sort_fun(x):
    for i, thing in enumerate(sortList):
        if x.startswith(thing):
            return (i, x[len(thing):])

print sorted(List2, key=sort_fun)

Comments

1

You can just provide the adequate key :

List1.sort( key = lambda x : ('INSC'.index(x[0]),x[-1]))
# ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

The elements will be sorted by the first letter then the last digit if exists. It works here because all first letters are different and if numbers have at most one digit.

EDIT

for many digits, a more obfuscated solution:

List1.sort( key =lambda x : ('INSC'.index(x[0]),int("0"+"".join(re.findall('\d+',x)))))
 # ['ID', 'Name1', 'Name2', 'Name10', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']

1 Comment

how about ['Color10', 'Color2']. The result may be wrong for numbers gt 9
0

Is there (in this case) easier way to extract data from string than simple regexes?

import re

def keygen(sort_list):
    return lambda elem: (
        sort_list.index(re.findall(r'^[a-zA-Z]+', elem)[0]),
        re.findall(r'\d+$', elem)
    )

Usage:

   SortList = ['ID', 'Name', 'Size', 'Color']
   List1 = ['Name1', 'Name3', 'Color1', 'Size2', 'Color3', 'Color2','Name2', 'Size1', 'ID']
   List2 = ['ID', 'Color1', 'Color2', 'Size1', 'Size2', 'Name1', 'Name2']
   sorted(List1, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Name3', 'Size1', 'Size2', 'Color1', 'Color2', 'Color3']
   sorted(List2, key=keygen(SortList))
=> ['ID', 'Name1', 'Name2', 'Size1', 'Size2', 'Color1', 'Color2']

Explanation:

^[a-zA-Z]+ matches alphabetic part at the beggining, and \d$ – numeric part at the end of string.

keygen returns lambda that takes a string, and returns two-item tuple:
first item is position of alphabetic part in the list (no such item in list = ValueError),
second is one-item list containing numeric part at the end, or empty list if string doesn't end with digit.

Some possible improvements:

  • sort_list.index call is O(n), and it will be called for each element in list; can be replaced with O(1) dict lookup to speed sorting up (I didn't do that to keep things simple),
  • numeric part can be convered into actual integers (1 < 2 < 10, but '1' < '10' < '2')

After applying those:

import re

def keygen(sort_list):
    index = {(word, index) for index, word in enumerate(sort_slist)}
    return lambda elem: (
        index[re.findall(r'^[a-zA-Z]+', elem)[0]],
        [int(s) for s in re.findall(r'\d+$', elem)]
    )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.