129

I have a CSV-like file, and I would like to sort it by column priority, like "ORDER BY" in SQL. For example, given the following rows,

3;1;2
1;3;2
1;2;3
2;3;1
2;1;3
3;2;1

If "ORDER BY" were column2, column1, column3, the result would be:

2;1;3
3;1;2
1;2;3
3;2;1
1;3;2
2;3;1

I'd like to know how to get this same result using the sort command on Unix.

3
  • 8
    By the way, that's an ssv file (semicolon separated values) :P Commented May 31, 2016 at 10:38
  • Sadly, sort is an unreliable foundation in the real world where quotation happens to be Commented Jan 26, 2022 at 17:14
  • I’m voting to close this question because it belongs on UNIX.SE and is a dupe of unix.stackexchange.com/questions/52762/… Commented Apr 24, 2022 at 20:46

4 Answers 4

204

You need to use two options for the sort command:

  • --field-separator (or -t)
  • --key=<start,end> (or -k), to specify the sort key, i.e. which range of columns (start through end index) to sort by. Since you want to sort on 3 columns, you'll need to specify -k 3 times, for columns 2,2, 1,1, and 3,3.

To put it all together,

sort -t ';' -k 2,2 -k 1,1 -k 3,3

Note that sort can't handle the situation in which fields contain the separator, even if it's escaped or quoted.

Also note: this is an old question, which belongs on UNIX.SE, and was also asked there a year later.


Old answer: depending on your system's version of sort, the following might also work:

sort --field-separator=';' --key=2,1,3

Or, you might get "stray character in field spec".

According to the sort manual, if you don't specify the end column of the sort key, it defaults to the end of the line.

Sign up to request clarification or add additional context in comments.

16 Comments

If the values are numeric, then you probably want consider using the -n option which will "compare according to string numerical value" or the -g option which will "compare according to general numerical value". A string comparison of numeric values will get the numbers ordered like 1,10,2,20. At least those are options available on my version of sort on CentOS. You should verify with the man page what the correct options are on your version of sort.
I get sort: stray character in field spec: invalid field specification ‘2,1,3’
However, sort --field-separator=',' -r -k3 -k1 -k2 source.csv > target.csv worked for me.
@MartinThoma it's been a long time but I ran into your problem and I found that sort --field-separator=';' --key={2,1,3}. This worked in GNU coreutils 8.4 from April 2016
@mrbolichi the notation --key={2,1,3} uses brace expansion of bash
|
38

Suppose you have another row 3;10;3 in your unsorted.csv file. Then I guess you expect a numerically sorted result:

2;1;3
3;1;2
1;2;3
3;2;1
1;3;2
2;3;1
3;10;3

and not an alphabetically sorted one:

2;1;3
3;1;2
3;10;3
1;2;3
3;2;1
1;3;2
2;3;1

To get that, you have to use -n:

sort --field-separator=';' -n -k 2,2 -k 1,1 -k 3,3 unsorted.csv

It is worth mentioning that 2,2 has to be used. If only 2 is used, then sort takes the string from beginning of field 2 to the end. 2,2 makes sure that only field 2 is used.

3 Comments

The pointer as to the difference between -k 2, and -k 2,2 is significant! I had overlooked this on my first reading of the man page. Thanks.
I added a few extra rows, 3;10;3 , 3:10:5 , 3:10;2, 3;10;3 in that order in the source file, and when using just -k 2,2 it appears to sort on column 2 and 3. The man page says "The -k option may be specified multiple times, in which case subsequent keys are compared when earlier keys compare equal.". In my case the earlier key (value=10) did compare equal, however, I didn't specify -k multiple times. I'm not sure if this is reliable behaviour, or related to my system (mac). Ultimately it doesn't matter though, as long as the primary sorting is correct.
Oh I see there is also -s stable sort which ignores the equal keys, that is apparently faster according to man.
27

Charlie's answer above didn't work for me on Cygwin (sort version 2.0, GNU textutils), the following did:

sort -t"," -k2 -k1 -k1

2 Comments

Cygwin has an older version of sort. As always, the man page is your friend.
I agree with @CharlieMartin, you should check the man page on your system. On CentOS I used sort --field-separator=';' -k2 -k1 -k3 test.csv
1

Using some tools that actually parse the CSV

The original question asked about using sort specifically, but since CSV files are not parseable by sort in general, here are some options that actually correctly parse your CSV to help other Googlers.

I'll test them with these particularly hard to parse CSV files containing a string with both a comma and a newline:

cat >noheader.csv <<EOF
12,"xx,
yy",4
12,aa,3
2,bb,1
2,cc,2
EOF

and:

cat >header.csv <<EOF
first,second,third
12,"xx,
yy",4
12,aa,3
2,bb,1
2,cc,2
EOF

and I'll sort the CSV by first and third column treating values as integers which should give:

2,bb,1
2,cc,2
12,aa,3
12,"xx,
yy",4

Miller (Go CLI tool)

Install Miller 6.12.0 on Ubuntu 24.10:

sudo apt install miller

With headers we can use:

mlr --csv sort -n first -n third header.csv

or as an equivalent shortcut:

mlr --csv sort -n first,third header.csv

both of which output:

first,second,third
2,bb,1
2,cc,2
12,aa,3
12,"xx,
yy",4

-n means treat column as numerical rather than lexicographic sort.

If the CSV does not have headers, we can generate headers named 1, 2, ... with --implicit-csv-header:

mlr --implicit-csv-header --csv sort -n 1,3 noheader.csv

which outputs:

1,2,3
2,bb,1
2,cc,2
12,aa,3
12,"xx,
yy",4

You might also want to remove the header from output with --headerless-csv-output:

mlr --headerless-csv-output --implicit-csv-header --csv sort -n 1,3 noheader.csv

and then it outputs just:

2,bb,1
2,cc,2
12,aa,3
12,"xx,
yy",4

Source code at: https://github.com/johnkerl/miller

Related documentation:

Sweet CLI tool with all that we need!

csvsort from csvtool (Python)

They don't have proper numerical sort: https://github.com/wireservice/csvkit/issues/151 so I'm not even going to bother for now.

Related

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.