Sort by string key and then by integer (bash)

Question

Thanks in advance for any advice.

I'm trying to sort a file by tab-delimited fields (as shown below). The important fields are fields 1 and 2.

ID (sorted by string values) and then the starting position in a larger string (sorted numerically).

KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
**KI270036.1    546     1448    -1**
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

This is an example of the table I'm trying to sort. Notice that the first fields are in order as desired, but the bolded field is out of order according to my specifications.

The command I entered was:

sort -g -t '        ' -k 1,2 my_file.txt

How can I alter this to achieve the records grouped by ID and then sorted numerically by the second field?

The output I'm looking for in this example is:

**KI270036.1    546     1448    -1**
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

By 'field 1 and field 2' do you mean the column with KI270036.1 etc as field 1 and the column starting with 5137 as field 2? — dawg
– dawg, Commented Apr 7, 2016 at 0:11

karakfa · Accepted Answer · 2016-04-07 00:15:57Z

2

you can define multiple keys, since first field is fixed size format no special flag is required (lexical sorting is fine), for the second specify numerical.

$ sort -k1,1 -k2n file

after removing stars that's what you'll get

KI270036.1  546     1448    -1
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

answered Apr 7, 2016 at 0:15

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

libby Over a year ago

This helped me, but it should have been -k2g instead of -k2n, or maybe it doesn't make a difference... I can't tell now. But it did help. Thanks.

karakfa Over a year ago

for integers n is fine.

Etan Reisner Over a year ago

You want -k2,2n assuming you don't want the second field sort to take the rest of the line into account. (It doesn't come into play here but if you duplicate the 546 line but make the third field 2448 in the first copy of the line you'll see the difference.

dawg · Accepted Answer · 2016-04-07 00:16:04Z

Do two passes through sort with the second a stable sort:

$ sort my_file.txt | sort -g -s -k 2
**KI270036.1    546     1448    -1**
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270041.1  5703    6308    1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1

Or,

$ sort -g -s -k 2  my_file.txt | sort -s
**KI270036.1    546     1448    -1**
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

Depending on the primary sort key (you don't have an example output and it is ambiguous...)

Collectives™ on Stack Overflow

Sort by string key and then by integer (bash)

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related