I have what I think should be a common problem, but I didn't find any good solution for it yet.
I have a file where each line has a chromosome number, a starting position in the chromosome and some related values, like below.
1 1.07299851019 1 1.07299851019 HQ chrY 2845223 + 0.251366120219 46
1 1.06860686763 1 1.06860686763 HQ chr10 88595309 + 0.256830601093 47
1 1.04688316093 3 3.14064948278 HQ chr6 49126474 + 0.295081967213 54
1 1.1563829915 1 1.1563829915 HQ chrX 16428176 + 0.185792349727 34
I want to sort this file using unix sort command both on chromosome (column 6) and starting position (column 7). After searching around I came up with this, which got me fairly close:
nohup sort -t $'\t' -k 6.4,6.5n -k 7,7n
The remaining problem that I can't solve is that while chromosomes numbered with a number is sorted alright chromosome X and chromosome Y are sorted together on starting position like this:
1 0.978579587641 9 8.80721628876 HQ chrX 2861057 - 0.431693989071 79
1 0.979500536702 1 0.979500536702 HQ chrY 2861314 - 0.420765027322 77
1 0.969979601694 9 8.72981641525 HQ chrX 2861649 - 0.469945355191 86
I know it would be possible to solve e.g. by replacing chrX and chrY with numbers, or write a program to solve it, but it would be super nice to be able to use a simple command, especially since the file sizes often are huge and I do this repeatedly.
It would also be nice if the chromosomes line up in order 1 to 22 and then X and then Y. My command had chromosome X and Y coming first and then chromosome 1 to 22.