I have the following tab-separated file:
A1 A1 0 0 1 1 0 0 0 0 2 2 1 2
A2 A2 0 0 1 1 1 1 1 1 0 0 1 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 1 2
The idea is to modify the information between column 7 (included) and the end in the way that, for every row, if column 7 and 8:
equal “0 0”: don’t modify
equal “1 1”: don’t modify
equal “1 2” or “2 1”: change to “2 2”
equal “2 2”: don’t modify
And the same for the following columns (9 and 10, then 11 and 12, 13 and 14, and so on..).
I started to extract the columns I want to work on using the command:
awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' test.ped > tmp_test.txt
Then I was thinking to use a for loop with If statements, with this general format:
for i between 7 and the end (for (i = 7; i <= NF)):
if i and i+1 == “1 2”:
replace by “2 2”
elif i and i+1 == “2 1”:
replace by “2 2”
else
pass
i=i+2 (increase i to do the same for the next double columns)
But I am stuck here. Is the general format logical or is there a faster way to do the same? Am I going in the right direction?
The expected output (after merging the first 6 columns from the initial file and the ones that I subsetted and modified) is:
A1 A1 0 0 1 1 0 0 0 0 2 2 2 2
A2 A2 0 0 1 1 1 1 1 1 0 0 2 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 2 2
Thank you for your help!
2 2. Could you edit your question to explain the condition and the columns to be modified more clearly?2 2for every line. You already asked a similar question and the answers you got were resoundingly, and correctly, not bash so why are you back to asking for a bash solution now?A1\tA1\t0\t0\t1\t1\t0 0\t0 0\t2 2\t1 2with blank chars instead of tabs between the pairs of digits that start at field 7 of your tab-separated file so field 7 is actually0 0?