2

I am trying to sort a list using Python's sorted function. In Python 3 the cmp keyword argument was removed. Unfortunately it seems that I cannot implement my algorithm using the key keyword argument since I need both objects in order to compare the data.

Example Sorted Data

59 59
59 3 1 1
59 4 3 3
61 1
61 10
61 237
61 1 1 1

Compare Function

NUM_RE = re.compile("[\d]+")
def compare(x, y):
    # Aggregate our data into integer arrays
    x_result = [int(x) for x in NUM_RE.findall(x)]
    y_result = [int(y) for y in NUM_RE.findall(y)]

    # Return if there is a non-zero difference in the first element
    statement_diff = x_result[0] - y_result[0]
    if statement_diff != 0:
        return statement_diff

    # Return if there is a non-zero difference between the lengths
    length_diff = len(x_result) - len(y_result)
    if length_diff != 0:
        return length_diff

    # len(x_result) == len(y_result)
    # Iterate over each item and return if there is a difference
    for i in range(1, len(x_result)):
        result = x_result[i] - y_result[i]
        if result != 0:
            return result

    # Results are the same
    return 0

What is the best method for sorting this data? Should I create a "wrapper object" that implements the __eq___, __gt__, __lt__, etc functions so I can use the default sorting function? Or is there another function included in the standard Python API that accomplishes the original behavior of sorted?

4
  • 1
    what are you trying to do exactly? I have a feeling it can be simplified quite a bit, also how does min(x_result.groups(), y_result.groups()) work with range? Commented Jun 12, 2015 at 20:25
  • Wouldn't it also be NUM_RE.findall and have no groups? Commented Jun 12, 2015 at 20:32
  • The comparison function you provide isn't correct. There's no find_all method, and findall returns a list -- not a match object, as you seem to expect here. Could you provide a working comparison function that shows what you're actually looking for? If the comparison function actually produces a consistent order, then it should be possible to rewrite it as a key function. Commented Jun 12, 2015 at 20:32
  • @senderle sorry I have fixed the compare function Commented Jun 12, 2015 at 20:51

2 Answers 2

5

Python already has the wrapper you describe, it's called functools.cmp_to_key

Sign up to request clarification or add additional context in comments.

Comments

3

As a matter of fact, this can easily be implemented as a key function. The comparison function behaves just as if the strings had been converted into lists. So it's as simple as converting the string to a list of numbers:

NUM_RE = re.compile("[\d]+")
def seq_key(x):
    return [int(y) for y in NUM_RE.findall(x)]

In case you feel uncertain about that, try this test:

cmp_key = functools.cmp_to_key(compare)

def gen_rnd(n):
    seq = [[random.randrange(1, 100)
           for _ in xrange(random.randrange(2, 6))]
           for _ in xrange(n)]
    return [' '.join(map(str, x)) for x in seq]

def test(n):
    a = gen_rnd(n)
    return all(sorted(x, key=seq_key) == sorted(x, key=cmp_key)
               for x in a)

test(1000000)

This test doesn't check corner cases, but I'm pretty sure these are semantically identical.

You might ask "why bother taking the trouble to figure out the right key function?" Here's why:

>>> a = gen_rnd(10000)
>>> %timeit sorted(a, key=cmp_key)
1 loops, best of 3: 705 ms per loop
>>> %timeit sorted(a, key=seq_key)
10 loops, best of 3: 47.9 ms per loop

The key function is more than an order of magnitude faster! And the effect is more noticeable the larger the list gets. This is partially because list comparison is a fast built-in feature. But it's also because there's genuinely less work to do. The number of comparisons performed is O(n log n), but the number of key conversions performed is only O(n). So if you can shift some of your comparison work into a key function, you can get a good speedup.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.