3

I'm having problems sorting a list of strings that contain negative and/or decimal alphanumeric strings. This is what I have so far:

import re

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib", 
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
#key = [2,1,3]
key = 2
produce_groups = False

if isinstance(key, int):
    key = [key]

convert = lambda text: float(text) if text.isdigit() else text
alphanum_key = lambda k: [convert(c) for c in re.split('([-.\d]+)', k)]
split_list = lambda name: tuple(alphanum_key(re.findall(selector,name)[0][i]) for i in key)
format_ids.sort(key=split_list)

print "\n".join(format_ids)

I'm expecting the following output (sorting by the 3rd key):

synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib

But I'm getting the following (all the negative numbers are listed last):

synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib

Now, for the decimals from the 2nd key (changing key variable to 1 (key=1)), I get:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib

Expecting:

synopsys_SS_1v_-40c_SS.lib
synopsys_TT_1v_85c_TT.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_TT_4v_125c_TT.lib
synopsys_TT_10v_85c_TT.lib

Any suggestions are greatly appreciated.

Edit: I ended up using the simpler method described by @StephenRauch:

import re
def sort_names(format_ids, selector, key=1):

    if isinstance(key, int):
        key = [key]

    SELECTOR_RE = re.compile(selector)

    def convert(x):
        try:
            return float(x[:-1])
        except ValueError:
            return x

    def sort_keys(key):
        def split_fid(x):
            x = SELECTOR_RE.split(x)
            return tuple([convert(x[i]) for i in key])
        return split_fid

    format_ids.sort(key=sort_keys(key))

format_ids = ["synopsys_SS_2v_-40c_SS.lib",
              "synopsys_SS_1v_-40c_SS.lib",
              "synopsys_SS_1.2v_-40c_SS.lib",
              "synopsys_SS_1.4v_-40c_SS.lib",
              "synopsys_SS_2v_-40c_TT.lib",
              "synopsys_FF_3v_25c_FF.lib",
              "synopsys_TT_4v_125c_TT.lib",
              "synopsys_TT_1v_85c_TT.lib",
              "synopsys_TT_10v_85c_TT.lib",
              "synopsys_FF_3v_-40c_SS.lib",
              "synopsys_FF_3v_-40c_TT.lib"]

selector = r'.*(FF|TT|SS)_([-\.\d]+v)_([-\.\d]+c)_(FF|TT|SS).*'
key = [2,1,3]

sort_names(format_ids,selector,key)

2 Answers 2

1

Need to test for numbers a bit differently, and the re.split() is given a leading '' which was throwing off the convert routine.

Fixed Code:

key = [2,1,3]

def convert(x):
    try:
        return float(x)
    except ValueError:
        return x

alphanum_keys = lambda k: (convert(c) for c in re.split('([-.\d]+)', k))
alphanum_key = lambda k: [i for i in alphanum_keys(k) if i != ''][0]
split_list = lambda name: [
    alphanum_key(re.findall(selector, name)[0][i]) for i in key]
format_ids.sort(key=split_list)

Alternate (simpler) solution:

But... All of those lambdas and regexs, are way more complicated than you need for this problem. How about just:

def sort_key(keys):

    def convert(x):
        try:
            return float(x[:-1])
        except ValueError:
            return x

    def f(x):
        x = x.split('_')
        return tuple([convert(x[i]) for i in keys])
    return f

format_ids.sort(key=sort_key([3, 2, 4]))

How?

sort_keys() returns a function f(). This is a function of one parameter that is passed to sort() to evaluate sort order. The function f() will use the values of keys that are passed to sort_keys() because these are the values available at the time f() is defined. This is called a closure.

Results:

synopsys_SS_1v_-40c_SS.lib
synopsys_SS_1.2v_-40c_SS.lib
synopsys_SS_1.4v_-40c_SS.lib
synopsys_SS_2v_-40c_SS.lib
synopsys_SS_2v_-40c_TT.lib
synopsys_FF_3v_-40c_SS.lib
synopsys_FF_3v_-40c_TT.lib
synopsys_FF_3v_25c_FF.lib
synopsys_TT_1v_85c_TT.lib
synopsys_TT_10v_85c_TT.lib
synopsys_TT_4v_125c_TT.lib
Sign up to request clarification or add additional context in comments.

8 Comments

@Steven\ Rauch, thanks for the alternate solution, I was trying to implement it but it's not working for me. In my case, I have to split by a user defined regex, so I replaced x.split('_') with x.split(selector). Given all this, where is f(x) called by?
@Kidneys, I added an explanation of closures.. Let me know if that doesn't answer the where is it being called? Also be sure to pass selector to sort_keys if it will be used in f()
@Steven\ Rauch, I completely missed that one, thanks for clarifying. I just updated the code above and getting 'list index out of range error'. Where am I going wrong?
@Kidneys, You are not doing a Regex Split, you are still doing a simple string split.
@StevenRauch, Just made a minor change to your edit and all is good now, Thanks!!
|
1

A big part of your problem is that only actual digits are considered digits, not dashes and periods, so in your code things like "-40".isdigit() or "1.4".isdigit() would be False, and stay as text rather than being converted to floats.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.