Choice of field separator affects sort's ordering

Question

Suppose we have a script named test_sort in our $PATH with the following contents:

#!/bin/bash

function echo_text () {
    printf -- "%s\n" "$fc$oc$fs$lc"
    printf -- "%s\n" "$fc$fs$oc$lc"
}

# first character
fc="$1"
# last character
lc="$2"
# other character
oc="$3"
# field separator
fs="$4"

echo_text | LC_ALL sort -t "$fs"

It prints out two lines which have the share the same first and last characters, but the positions of the field separator and the "other" character swap between them.

If we run test_sort a b y x, we get the following output:

axyb
ayxb

The fourth argument, x is used as the field separator. If we swap it for a character with a higher, such as z, we would run test_sort a b y z, giving us:

ayzb
azyb

It makes no sense. If the outputs of each invocation were passed through sed "s/[xz]/_/g, the lines are ordered differently. After normalizing the field separators, test_sort a b y x and test_sort a b y z should produce identical outputs. Instead, the order changes based on whether the field separator's character code is higher or lower than the field separator's. ay should always be coming after a because that's how alphabetical sorting of two words, where one only differs from the other by having additional letters appended to it, works. Tim comes before Timothy. The field separator is being included in the comparison!

Is GNU find supposed to be acting like this? Is there anyway to make it act the way I expect it should?

What does find (mentioned at the end) have to do with anything? Also, your invocation of sort seems to be prefixed by LC_ALL in a way that makes LC_ALL the command and sort an argument to that command. — Kusalananda
– Kusalananda ♦, Commented 23 hours ago
Since you are not telling sort what fields to use (e.g. with -k 1,1 -k 2,2), it's using the whole line as the sort key. That would not avoid using the field delimiter as a sorting character. You get the same behaviour with the default field delimiter. I'm sure this has been asked before... — Kusalananda
– Kusalananda ♦, Commented 22 hours ago
Related to @Kusalananda 's comment above, did your indent to have ... LC_ALL=_something_ sort ...? — Andy Dalton
– Andy Dalton, Commented 14 hours ago

terdon · Accepted Answer · 2025-11-17 15:52:23Z

Using a field separator with sort only makes sense if you sort on specific fields. If not, then the field separator is irrelevant. And that's why you get different sorting. So you're not seeing the field separator affect the sorting order, you are seeing the different letters in a string affect the sorting order. In the first case, you sort these two strings:

axyb
ayxb

Since x comes before y, axyb is sorted before ayxb. Next, you sort

ayzb
azyb

and here, since y comes before z, ayzb is sorted first. Now, if you change your script to actually do something with the field separator, for instance if you only sort on the first field, then you will see what I am guessing you were expecting:

#!/bin/bash

function echo_text () {
  printf -- "%s\n" "$fc$oc$fs$lc"
  printf -- "%s\n" "$fc$fs$oc$lc"
}

# first character
fc="$1"
# last character
lc="$2"
# other character
oc="$3"
# field separator
fs="$4"

echo_text | LC_ALL=C sort -k1,1 -t "$fs"

Running on the two examples you show returns the same ordering:

$ foo.sh a b y x
axyb
ayxb
$ foo.sh a b y z
azyb
ayzb

This is because now, you are sorting a vs ay in the first example and avs ay in the second, so since they're identical, they are sorted in the same way.

Stack Exchange Network

Choice of field separator affects sort's ordering

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Choice of field separator affects sort's ordering

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions