2

So I have a list of strings that looks roughly like this:

list = ['file.t00Z.wrff02.grib2', 'file.t00Z.wrff03.grib2', 'file.t00Z.wrff00.grib2',
        'file.t00Z.wrff05.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff01.grib2', 
        'file.t06Z.wrff01.grib2', 'file.t06Z.wrff00.grib2', 'file.t06Z.wrff02.grib2', ...]

I recently asked a question here wherein I learned how to sort my list of strings by substring using a lambda function:

list.sort(key=lambda x: x[x.find('wrff'):])

But now I need to know if there's a way to sort by two different substrings, almost like a composite primary key in a database. I'd like to sort the files first by the two digits following "file.t", and then by the two digits following "wrff". Is there a way that both of these actions can be performed at once?

SOLUTION: I wound up using the two-tuple lambda function sort that user Moses Koledoye recommended below, but I ran into problems when trying to apply this sorting process to groups of filenames with different naming conventions.

In my script I have 3 Python objects which grab files from unique data directories and form a list (like the one above) containing the files. Each of the objects grab files with different naming conventions, and each different group of files has a varying number of digit groups within their names.

To handle this without adding complexity, I decided to use the natsort module that user Jared Gougen suggested, and it worked very nicely.

2 Answers 2

5

You can use re.findall to pick those first two digits and then use them for sorting in a 2-tuple:

import re

lst = sorted(lst, key=lambda x: tuple(int(i) for i in re.findall('\d+', x)[:2]))
print(lst)
# ['file.t00Z.wrff00.grib2', 'file.t00Z.wrff01.grib2', 'file.t00Z.wrff02.grib2', 
#  'file.t00Z.wrff03.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff05.grib2', 
#  'file.t06Z.wrff00.grib2', 'file.t06Z.wrff01.grib2', 'file.t06Z.wrff02.grib2', ...]

This takes the first digit after file.t and then that after wrff.

Sign up to request clarification or add additional context in comments.

3 Comments

@ChristianDean The answer addresses that. See [:2].
Ah, I see what you mean. Nice answer. +1 Sorry about that. Like I said, my brain was fried.
@ChristianDean does have a point though, this will capture stray 1-digit sequences and sequences not after the requested substrings (which may or may not be an issue).
4

It seems like this is approaching the area where regular expressions are useful. Here's one solution which captures the two subsequences of digits that you require.

import re

get_indices = lambda s: re.match('^.*?file\.t([0-9]{2}).*?wrff([0-9]{2}).*$', s).groups()
sorted(file_names, key=get_indices)

Or, in situations like these, I'm often trying to naturally sort file names. In those cases, I have the following set of functions in a library file.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(string):
    return [tryint(c) for c in re.split('([0-9]+)', string)]

def sort_naturally(l):
    return sorted(l, key=getchunks)

The library natsort was written to naturally sort on a more comprehensive level if you're looking for something more heavy duty.

2 Comments

Wow the natsort package is a really neat suggestion. Thanks!
The re.split feature is really handy. Using the tryint(s) defined here, and sorting a glob.glob("/usr/share/icons/"+theme+"/*/"+category+"/"+name+".*") I have my key function return [tryint(c) for c in re.split('([0-9]+)',x.split("/")[5])] which works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.