5

GNU sort on Linux is not giving expected results on my csv file. Can you please help to resolve the situation/ issue?

Input file

[nscruser]$ cat cemp1.txt
10,30
50,900
20,1050

Objective I need to do numeric sort on the first field for the above file

[nscruser]$  sort -t',' -k1 -n cemp1.txt
10,30
50,900
20,1050

Expected output But I expected the output as below as I am doing a numeric sort on first column

10,30
20,1050
50,900

Can someone please let me know why the discrepancy?

2
  • 1
    Are you on Linux? Is this GNU sort? Commented Jul 25, 2022 at 10:36
  • [nscruser]$ uname Linux Commented Jul 25, 2022 at 10:44

2 Answers 2

6

Looking at the man page of sort (from GNU coreutils 8.32),

-k, --key=KEYDEF sort via a key; KEYDEF gives location and type

...

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1, and the stop position defaults to the line's end. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options [bdfgiMhnRrV], which over‐ ride global ordering options for that key. If no key is given, use the entire line as the key. Use --debug to diagnose incorrect key usage.

First, you can use --debug as suggested,

$ sort -t',' -k1 -n --debug cemp1.txt
sort: text ordering performed using ‘en_IE.UTF-8’ sorting rules
sort: key 1 is numeric and spans multiple fields
10,30
_____
_____
50,900
______
______
20,1050
_______
_______

That gives us a clue: "key 1 is numeric and spans multiple fields".

As the man page says, "the stop position defaults to the line's end". So you need to add a stop position:

$ sort -t',' -k1,1 -n cemp1.txt
10,30
20,1050
50,900
8
  • 3
    This seems to be specific to GNU sort. The native sort on BSD systems does not seem to have this quirk. Commented Jul 25, 2022 at 10:29
  • 1
    @Dario Thanks a lot for the detailed explanation. Thank you all for the help Commented Jul 25, 2022 at 10:51
  • 2
    I get correct sorting with the command in the original question, and I have GNU sort: $ sort --version output is sort (GNU coreutils) 8.28 ...; @user1626902, which version are you running? or maybe the sorting rules of en_IE.UTF-8 is causing the problem. What happens if you prefix the command line with LANG=C Commented Jul 25, 2022 at 10:57
  • 1
    @sudodus I'm guessing you are using the C or POSIX locale. Using LC_ALL=C sort ... would be the other solution to the user's issue. Commented Jul 25, 2022 at 11:07
  • 3
    ... see also unix sort -n -t"," gives unexpected result Commented Jul 25, 2022 at 12:48
2

You could try to prefix your command with LANG or LC_ALL locale variables :

LANG=C sort -t',' -k1,1 -n cemp1.txt

or

LC_ALL=C sort -t',' -k1,1 -n cemp1.txt

The variable used depends on the command and/or the OS version. On HP-UX 11.31 (Unix SYSTEM 5) : man 3C locale extract

LANG                LC_MESSAGES
LC_ALL              LC_MONETARY
LC_COLLATE          LC_NUMERIC
LC_CTYPE            LC_TIME

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.