0

Thanks in advance for any advice.

I'm trying to sort a file by tab-delimited fields (as shown below). The important fields are fields 1 and 2.

ID (sorted by string values) and then the starting position in a larger string (sorted numerically).

KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
**KI270036.1    546     1448    -1**
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

This is an example of the table I'm trying to sort. Notice that the first fields are in order as desired, but the bolded field is out of order according to my specifications.

The command I entered was:

sort -g -t '        ' -k 1,2 my_file.txt

How can I alter this to achieve the records grouped by ID and then sorted numerically by the second field?

The output I'm looking for in this example is:

**KI270036.1    546     1448    -1**
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1
2
  • By 'field 1 and field 2' do you mean the column with KI270036.1 etc as field 1 and the column starting with 5137 as field 2? Commented Apr 7, 2016 at 0:11
  • yes, I'm going with the convention used by cut and awk Commented Apr 7, 2016 at 0:23

2 Answers 2

2

you can define multiple keys, since first field is fixed size format no special flag is required (lexical sorting is fine), for the second specify numerical.

$ sort -k1,1 -k2n file

after removing stars that's what you'll get

KI270036.1  546     1448    -1
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1
Sign up to request clarification or add additional context in comments.

3 Comments

This helped me, but it should have been -k2g instead of -k2n, or maybe it doesn't make a difference... I can't tell now. But it did help. Thanks.
for integers n is fine.
You want -k2,2n assuming you don't want the second field sort to take the rest of the line into account. (It doesn't come into play here but if you duplicate the 546 line but make the third field 2448 in the first copy of the line you'll see the difference.
0

Do two passes through sort with the second a stable sort:

$ sort my_file.txt | sort -g -s -k 2
**KI270036.1    546     1448    -1**
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270041.1  5703    6308    1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1

Or,

$ sort -g -s -k 2  my_file.txt | sort -s
**KI270036.1    546     1448    -1**
KI270036.1  5137    5523    -1
KI270036.1  5215    5636    -1
KI270036.1  6364    7425    -1
KI270036.1  8687    9529    -1
KI270041.1  1957    2343    1
KI270041.1  3114    3423    1
KI270041.1  4792    5439    1
KI270041.1  5703    6308    1

Depending on the primary sort key (you don't have an example output and it is ambiguous...)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.