17

Okay, now this is more a rant about Linux than a question, but maybe someone knows how to do what I want. I know this can be achieved using the sort command, but I want a better solution because getting that to work is about as easy as writing a C program to do the same thing.

I have files, for arguments sake, lets say I have these files: (my files are the same I just have many more)

  • file-10.xml
  • file-20.xml
  • file-100.xml
  • file-k10.xml
  • file-k20.xml
  • file-k100.xml
  • file-M10.xml
  • file-M20.xml
  • file-M100.xml

Now this turns out to be the order I want them sorted in. Incidentally, this is the order in Windows that they are by default sorted into. That's nice. Windows groups consecutive numerical characters into one effective character which sorts alphabetically before letters.

If I type ls at the linux command line, I get the following garbage. Notice the 20 is displaced. This is a bigger deal when I have hundreds of these files that I want to view in a report, in order.

  • file-100.xml
  • file-10.xml
  • file-20.xml
  • file-k100.xml
  • file-k10.xml
  • file-k20.xml
  • file-M100.xml
  • file-M10.xml
  • file-M20.xml

I can use ls -1 | sort -n -k 1.6 to get the ones without 'k' or 'M' correct...

  • file-k100.xml
  • file-k10.xml
  • file-k20.xml
  • file-M100.xml
  • file-M10.xml
  • file-M20.xml
  • file-10.xml
  • file-20.xml
  • file-100.xml

I can use ls -1 | sort -n -k 1.7 to get none of it correct

  • file-100.xml
  • file-10.xml
  • file-20.xml
  • file-k10.xml
  • file-M10.xml
  • file-k20.xml
  • file-M20.xml
  • file-k100.xml
  • file-M100.xml

Okay, fine. Let's really get it right. ls -1 | grep "file-[0-9]*\.xml" | sort -n -k1.6 && ls -1 file-k*.xml | sort -n -k1.7 && ls -1 file-M*.xml | sort -n -k1.7

  • file-10.xml
  • file-20.xml
  • file-100.xml
  • file-k10.xml
  • file-k20.xml
  • file-k100.xml
  • file-M10.xml
  • file-M20.xml
  • file-M100.xml

Whew! Boy glad the "power of the linux command line" saved me there. (This isn't practical for my situation, because instead of ls -1 I have a command that is another line or two long)

Now, the Windows behavior is simple, elegant, and does what you want it to do 99% of the time. Why can't I have that in linux? Why oh why does sort not have a "automagic sort numbers in a way that doesn't make me bang head into wall" switch?

Here's the pseudo-code for C++:

bool compare_two_strings_to_avoid_head_injury(string a, string b)
{
    string::iterator ai = a.begin();
    string::iterator bi = b.begin();
    for(; ai != a.end() && bi != b.end(); ai++, bi++)
    {
        if (*ai is numerical)
            gobble up the number incrementing ai past numerical chars;
        if (*bi is numerical)
            gobble up the number incrementing bi past numerical chars;
        actually compare *ai and *bi and/or the gobbled up number(s) here
            to determine if we need to compare more chars or can return the 
            answer now;
    }
    return something here;
}

Was that so hard? Can someone put this in sort and send me a copy? Please?

7
  • 9
    You could have saved yourself a bit of pain by padding the numeric fields to the same length with leading zeroes, instead of relying on platform-specific quirks to get the sort order you want. Just sayin'.... Commented Jul 24, 2010 at 3:16
  • 2
    I will point out that maybe the Windows behavior does what you want it to do 99% of the time, but it's not fair to say that it does what everyone wants 99% of the time. As a matter of fact I could just as well make the same complaint about Windows' sorting that you've made about Linux's sorting. (It would be nice to have this as an option to sort though) Commented Jul 24, 2010 at 3:38
  • 3
    What is the programming question here? If you just want to sort filenames, somebody at superuser.com might be able to help. Commented Jul 24, 2010 at 3:41
  • Windows did not always sort this way. See support.microsoft.com/kb/319827 Commented Jul 24, 2010 at 4:11
  • 2
    @Scott: yes you did use a platform-specific quirk, namely the fact that dir groups consecutive numbers into an "effective character" whereas ls doesn't. Although technically it's a quirk of the dir program, not of Windows. Similarly, what you call a problem with Linux is actually a "problem" with one particular program, sort. (And besides, it's not a problem in the same way that a legitimate bug is a problem, it's just a design decision that happens to not match your requirements. That happens from time to time on every platform.) Commented Aug 3, 2010 at 6:22

3 Answers 3

37

Try sort --version-sort -f

  • file-10.xml
  • file-20.xml
  • file-100.xml
  • file-k10.xml
  • file-k20.xml
  • file-k100.xml
  • file-M10.xml
  • file-M20.xml
  • file-M100.xml

The -f option is to ignore case (otherwise, it would put the k's and M's in the wrong order in this example). However, I don't think sort isn't properly interpreting the letters k and M as thousands and millions, if that was your goal - its just alphabetical order.

Sign up to request clarification or add additional context in comments.

1 Comment

Much better solution than the selected answer...maybe less portable I guess. -V is the short flag for --version-sort, for reference.
16

ls -1v will get you pretty close. It just sorts all capital letters before lower case.

1 Comment

This is working for numerics having different digit size as well: 1 2 3 ... 12 13 14 ... 123 124 125 ... 1123 1124 1125 ...
2

This would be my first thought:

ls -1 | sed 's/\-\([kM]\)\?\([0-9]\{2\}\)\./-\10\2./' | sort | sed 's/0\([0-9]\{2\}\)/\1/'

Basically I just use sed to pad the number with zeros and then use it again afterwards to strip off the leading zero.

I don't know if it might be quicker in Perl.

1 Comment

This is what I ended up doing, based on your suggestion. I have this since I needed up to 4 digits for f in `ls -1 $1*.xml | sed -r 's/-([kM]?)([0-9]{4})\./-\10\2./; s/-([kM]?)([0-9]{3})\./-\100\2./; s/-([kM]?)([0-9]{2})\./-\1000\2./; s/-([kM]?)([0-9]{1})\./-\10000\2./' | sort | sed -r 's/0+([1-9])/\1/'`; do which I find to be thoroughly ridiculous for such a simple task. It's a large failing of sort IMO.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.