1

I have files with names like this:

'aaaa 4b 123b.txt'
'aaaa 4b 124b.txt'
'aaaa 4b 125b.txt'
'aaaa 4b 126b.txt'
'aaaa 4b 127b.txt'
'aaaa 4b 128b.txt'
'aaaa 4b 129 (123c)b.txt'
'aaaa 4b 129 (124c)b.txt'
'aaaa 4b 129 (125c)b.txt'
'aaaa 4b 129 (126c)b.txt'
'aaaa 4b 129 (127c)b.txt'
'aaaa 4b 129b.txt'
'aaaa 4b 130b.txt'
'aaaa 4b 131b.txt'
'aaaa 4b 132b.txt'

Each files names are stored in a list using os.listdir(path)

The above is how the files should be sorted, but my files are, in fact, not sorted at all. The parenthesis inside the file names in 129-series makes the issue complicated. If there were no parenthesis in some of the file names, I can sort using,

List.sort(key = lambda x: int(re.search('([0-9]+)(b.txt)', x).group(1)))

But How can I make exceptions for files that have parenthesis, and sort everything at once?

Edit: Original (unsorted list)

['aaaa 4b 128b.txt', 'aaaa 4b 127b.txt', 'aaaa 4b 129 (127c)b.txt', 'aaaa 4b 131b.txt', 'aaaa 4b 123b.txt', 'aaaa 4b 129 (125c)b.txt' ...]

How I want it to be:

['aaaa 4b 123b.txt', 'aaaa 4b 124b.txt', 'aaaa 4b 125b.txt' ... 'aaaa 4b 128b.txt', 'aaaa 4b 129 (124c)b.txt', 'aaaa 4b 129 (125c)b.txt', ... 'aaaa 4b 131b.txt', 'aaaa 4b 132b.txt']
1
  • Can you please add some examples of input filenames and how you want them sorted? Commented Mar 30, 2018 at 5:57

2 Answers 2

4

You don't need a regular expression. Just use str.split():

filenames.sort(key = str.split)

str.split() converts each filename into a list of words, which words are sorted lexicographically, also known as "phone-book order".

Consider the first two filenames in the list in your question:

sort() wants to compare 'aaaa 4b 123b.txt' against 'aaaa 4b 124b.txt'. First it applies str.split() to each string. The resulting comparison is between ['aaaa', ',4b', '123b.txt'] and ['aaaa', '4b', '124b.txt']. list comparisons are done in lexicographic order (also known as "phone-book order". The compare is done against each element of the list in turn:

'aaaa' == 'aaaa'
'4b' == '4b'
'123b.txt' < '124b.txt'

So the first filename is evaluated as less than the second one.

Similarly for 'aaaa 4b 129 (127c)b.txt' and 'aaaa 4b 129b.txt',

'aaaa' == 'aaaa'
'4b' == '4b'
'129' < '129b.txt'

So these two filenames have the appropriate comparison.

Sign up to request clarification or add additional context in comments.

1 Comment

wow this is so simple. But I don't understand the logic behind this. Can you explain how this code works?
1

What's the issue with regular string sorting algorithm?

A = [
'aaaa 4b 127b.txt',
'aaaa 4b 129 (125c)b.txt',
'aaaa 4b 128b.txt',
'aaaa 4b 129 (123c)b.txt',
'aaaa 4b 129b.txt',
'aaaa 4b 129 (127c)b.txt',
'aaaa 4b 129 (124c)b.txt',
'aaaa 4b 129 (126c)b.txt',
]

A.sort()
print '\n'.join(A)

prints

aaaa 4b 127b.txt
aaaa 4b 128b.txt
aaaa 4b 129 (123c)b.txt
aaaa 4b 129 (124c)b.txt
aaaa 4b 129 (125c)b.txt
aaaa 4b 129 (126c)b.txt
aaaa 4b 129 (127c)b.txt
aaaa 4b 129b.txt

This is because regular sorting algorithm uses __lt__()(less than) method of string for comparison between the string elements in the list, which results in the file names being sorted in the lexicographical order (aka dictionary order).

The whitespace ' ' follows 129 is less than 'b' that follows 129 in another string.

Among strings with parentheses, since they all have ' (12' following 129, the next character, 5, 3, 7, 4, and 6, is used for comparison.

6 Comments

facepalm - Of course you are right. A.sort() is the obvious answer. If my answer weren't already accepted, I'd delete it.
@Robᵩ haha, I think the question was bad. I was confused what OP was trying to do too.
@EricNa , I wanted to know how A.sort() maintaining the hierarchy of int values ? because i i give aaaa 3b 129 (124c)b(1).txt and aaaa 4b 129 (124c)b(2).txt , it is first sorting by 3b and then 129 and then b(1) , How ? can you explain in your answer ?
@AyodhyankitPaul They're not int values. They are characters, even though they represent numbers. Your two strings both start with the same aaaa , so the next character '3' one string and the next character '4' from the other string get compared. '3' < '4', so aaaa 3b 129 (124c)b(1).txt comes first, regardless of what follows after '3' and '4'
@EricNa so it means first it compare basis on 'aaaa' then '3' and then '129' then '124' and at last b(1) right ?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.