7

Say I have this file.

$ cat a.txt
c 1002 4
f 1001 1
d 1003 1
a 1001 3
e 1004 2
b 1001 2

I want to sort it by the second column and then by the third column. Column two are numbers, while column 3 can be treated as string. I know the following command works well.

$ sort -k2,2n -k3,3 a.txt
f 1001 1
b 1001 2
a 1001 3
c 1002 4
d 1003 1
e 1004 2

However, I think sort -k2n a.txt should also work, while it does not.

$ sort -k2n a.txt
a 1001 3
b 1001 2
f 1001 1
c 1002 4
d 1003 1
e 1004 2

Seems like it sorts by column two, and then by column one instead of column three. Why is this happening? Is it a bug or not? Cause sort -k2 a.txt works ok with above data since those numbers are just fixed width.

My sort version is sort (GNU coreutils) 8.15 in cygwin.

4
  • Interesting. sort -k2 a.txt will work in this case. -k2 tells it to sort using a key that starts at field 2 and continues to the end of line. -k2n tells it to sort field 2 in numeric order; that might mean the sort key ends on encountering whitespace between fields 2 and 3. It might be a good idea to paste the version of your sort into the question somewhere. Commented Jun 8, 2013 at 10:56
  • Using sort (GNU coreutils) 8.5 I am able to reproduce the described behaviour on Debian stable. Commented Jun 8, 2013 at 11:16
  • @MikeSherrill'Catcall' When you attempt to sort a non-numeric value numerically, sort(1) falls back to string sorting. "1001 3" etc. as by -k2n are not numeric. Commented Jun 8, 2013 at 12:58
  • I ran across this while trying to solve a similar problem: sort -k2 -u and sort -k2n -u yield different results on your file. I eventually figured out why (a 1001 3 and b 1001 2 are both numerically identical to 1001, but not equal as strings), but, still, argh! Commented May 16, 2015 at 4:00

1 Answer 1

10

I find this caution in the GNU sort docs.

Sort numerically on the second field and resolve ties by sorting alphabetically on the third and fourth characters of field five. Use ‘:’ as the field delimiter.

      sort -t : -k 2,2n -k 5.3,5.4

Note that if you had written -k 2n instead of -k 2,2n sort would have used all characters beginning in the second field and extending to the end of the line as the primary numeric key. For the large majority of applications, treating keys spanning more than one field as numeric will not do what you expect.

I'm not sure what it ends up with when it evaluates '1001 3' as a numeric key, but "will not do what you expect" is accurate. It seems clear that the Right Thing to do is to specify each key independently.

The same web page says this about resolving "ties".

Finally, as a last resort when all keys compare equal, sort compares entire lines as if no ordering options other than --reverse (-r) were specified.

I'll confess I'm a little mystified about how to interpret that.

Sign up to request clarification or add additional context in comments.

4 Comments

The last paragraph most certainly means that, the values for all specified keys considered equal, sort(1) uses simple string comparison on the lines and observes only --reverse (or -r) if it is specified. For example, if there are lines foo:42:bar:baz:blabla and foo:42:baz:bar:blabla, the former is sorted before the latter with these key options because of "bar" < "baz", and vice-versa if you use -r.
Thanks @Mike's effort. I think the sort docs explain some. We should just be careful about treating keys spanning more than one field as numeric.
@PointedEars: That would explain the behavior, I think. Sort by the key first, then by the whole line. The whole line, of course, starts with the first field.
The behaviour specified in the last paragraph can be disabled with -s/--static, which preserves the original order in the event of a tie. This can be handy if for some reason you need to do multiple sorts with different sorting programs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.