0

short of renaming/fixing the logging module on the webservers... when i do a list.sort(), the list entries get placed in the following order:

2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.0.gz
2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.1.gz
2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.2.gz
2011-09-21 19:15:54,732 DEBUG __main__ 44: running www.site.com-110731.log.3.gz
2011-09-21 19:15:54,732 DEBUG __main__ 44: running www.site.com-110731.log.gz

how would i sort a list, to get (ie the entry eithout a digit to be first):

2011-09-21 19:15:54,732 DEBUG __main__ 44: running www.site.com-110731.log.gz
2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.0.gz
2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.1.gz
2011-09-21 19:15:54,731 DEBUG __main__ 44: running www.site.com-110731.log.2.gz
2011-09-21 19:15:54,732 DEBUG __main__ 44: running www.site.com-110731.log.3.gz

THANKS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

1 Answer 1

4

You would probably want to write a custom comparator to pass to sort; in fact, you probably need to anyway, because you're likely getting a lexicographical sort order instead of the intended (I presume) numerical order.

For instance, if you know that the filenames will only differ in those digits, you'd write a comparator that extracts those digits, converts them to int, and then compares based on that value.

Taking your examples as canonical, your comparator might look something like this:

import re
def extract(s):
    r = re.compile(r'\.(\d+)\.log\.((\d*)\.)?gz')
    m = r.search(s)
    file = int(m.group(1))
    if not m.group(2):
        return (file, -1)
    index = int(m.group(3))
    return (file, index)

def comparator(s1, s2): return cmp(extract(s1), extract(s2))

This prefers to sort based on the "file" number (the first one), and then by the "index" number (the second one). Note that it takes advantage of the fact that using cmp on tuples works as we require.

Sign up to request clarification or add additional context in comments.

7 Comments

well... i know the 110701.log.gz always would be first, then the .log.1.gz .log.2.gz , etc
This is a bit off the cuff, but seems to work as you seem to intend. You should compile the regular expression outside of extract if this is "serious business".
would l.sort(key=extract) work, as sort's cmp= is deprecated?
Yes, it should work exactly the same; I didn't realize it was deprecated in Python 3, so that's good to know.
Guys, i must admit, this is a bit over my head... trying to learn :) should i pass the list to the comparator function? should i split up the logfile names myself, then pass them?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.