Sort list of strings by two substrings using lambda function

Question

So I have a list of strings that looks roughly like this:

list = ['file.t00Z.wrff02.grib2', 'file.t00Z.wrff03.grib2', 'file.t00Z.wrff00.grib2',
        'file.t00Z.wrff05.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff01.grib2', 
        'file.t06Z.wrff01.grib2', 'file.t06Z.wrff00.grib2', 'file.t06Z.wrff02.grib2', ...]

I recently asked a question here wherein I learned how to sort my list of strings by substring using a lambda function:

list.sort(key=lambda x: x[x.find('wrff'):])

But now I need to know if there's a way to sort by two different substrings, almost like a composite primary key in a database. I'd like to sort the files first by the two digits following "file.t", and then by the two digits following "wrff". Is there a way that both of these actions can be performed at once?

SOLUTION: I wound up using the two-tuple lambda function sort that user Moses Koledoye recommended below, but I ran into problems when trying to apply this sorting process to groups of filenames with different naming conventions.

In my script I have 3 Python objects which grab files from unique data directories and form a list (like the one above) containing the files. Each of the objects grab files with different naming conventions, and each different group of files has a varying number of digit groups within their names.

To handle this without adding complexity, I decided to use the natsort module that user Jared Gougen suggested, and it worked very nicely.

Moses Koledoye · Accepted Answer · 2017-09-14 21:04:38Z

5

You can use re.findall to pick those first two digits and then use them for sorting in a 2-tuple:

import re

lst = sorted(lst, key=lambda x: tuple(int(i) for i in re.findall('\d+', x)[:2]))
print(lst)
# ['file.t00Z.wrff00.grib2', 'file.t00Z.wrff01.grib2', 'file.t00Z.wrff02.grib2', 
#  'file.t00Z.wrff03.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff05.grib2', 
#  'file.t06Z.wrff00.grib2', 'file.t06Z.wrff01.grib2', 'file.t06Z.wrff02.grib2', ...]

This takes the first digit after file.t and then that after wrff.

edited Sep 14, 2017 at 21:04

answered Sep 14, 2017 at 21:00

Moses Koledoye

78.8k8 gold badges139 silver badges141 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Moses Koledoye Over a year ago

@ChristianDean The answer addresses that. See [:2].

Chris Over a year ago

Ah, I see what you mean. Nice answer. +1 Sorry about that. Like I said, my brain was fried.

Jared Goguen Over a year ago

@ChristianDean does have a point though, this will capture stray 1-digit sequences and sequences not after the requested substrings (which may or may not be an issue).

Jared Goguen · Accepted Answer · 2017-09-14 21:09:46Z

4

It seems like this is approaching the area where regular expressions are useful. Here's one solution which captures the two subsequences of digits that you require.

import re

get_indices = lambda s: re.match('^.*?file\.t([0-9]{2}).*?wrff([0-9]{2}).*$', s).groups()
sorted(file_names, key=get_indices)

Or, in situations like these, I'm often trying to naturally sort file names. In those cases, I have the following set of functions in a library file.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(string):
    return [tryint(c) for c in re.split('([0-9]+)', string)]

def sort_naturally(l):
    return sorted(l, key=getchunks)

The library natsort was written to naturally sort on a more comprehensive level if you're looking for something more heavy duty.

edited Sep 14, 2017 at 21:09

answered Sep 14, 2017 at 21:04

Jared Goguen

9,0182 gold badges22 silver badges39 bronze badges

2 Comments

nat5142 Over a year ago

Wow the natsort package is a really neat suggestion. Thanks!

bgStack15 Over a year ago

The re.split feature is really handy. Using the tryint(s) defined here, and sorting a glob.glob("/usr/share/icons/"+theme+"/*/"+category+"/"+name+".*") I have my key function return [tryint(c) for c in re.split('([0-9]+)',x.split("/")[5])] which works.

Collectives™ on Stack Overflow

Sort list of strings by two substrings using lambda function

2 Answers 2

3 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related