0

I have the following tab-separated file:

A1    A1    0       0       1       1       0 0     0 0     2 2     1 2
A2    A2    0       0       1       1       1 1     1 1     0 0     1 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     1 2

The idea is to modify the information between column 7 (included) and the end in the way that, for every row, if column 7 and 8:

  • equal “0 0”: don’t modify

  • equal “1 1”: don’t modify

  • equal “1 2” or “2 1”: change to “2 2”

  • equal “2 2”: don’t modify

And the same for the following columns (9 and 10, then 11 and 12, 13 and 14, and so on..).

I started to extract the columns I want to work on using the command:

awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' test.ped > tmp_test.txt

Then I was thinking to use a for loop with If statements, with this general format:

for i between 7 and the end (for (i = 7; i <= NF)):
    if i and i+1 == “1 2”:
        replace by “2 2”
    elif i and i+1 == “2 1”:
        replace by “2 2”
    else
        pass
    i=i+2 (increase i to do the same for the next double columns)

But I am stuck here. Is the general format logical or is there a faster way to do the same? Am I going in the right direction?

The expected output (after merging the first 6 columns from the initial file and the ones that I subsetted and modified) is:

A1    A1    0       0       1       1       0 0     0 0     2 2     2 2
A2    A2    0       0       1       1       1 1     1 1     0 0     2 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     2 2

Thank you for your help!

3
  • 1
    Hmm it looks like the only difference between your input and output is that each line has been changed to end in 2 2. Could you edit your question to explain the condition and the columns to be modified more clearly? Commented Aug 30, 2016 at 8:57
  • When providing sample input and expected output you should cover all your use cases to demonstrate what you are trying to explain in your text. Right now your sample output makes it look like your problem could be solved by just changing the last 2 fields to 2 2 for every line. You already asked a similar question and the answers you got were resoundingly, and correctly, not bash so why are you back to asking for a bash solution now? Commented Aug 30, 2016 at 15:27
  • Am I right in guessing that your input file isn't ALL tab-separated and your first line, for example, is actually A1\tA1\t0\t0\t1\t1\t0 0\t0 0\t2 2\t1 2 with blank chars instead of tabs between the pairs of digits that start at field 7 of your tab-separated file so field 7 is actually 0 0? Commented Aug 30, 2016 at 15:43

4 Answers 4

1
$ awk '{$1=$1;for(i=7;i<=NF;i+=2) if($i""$(i+1)=="1""2" || $i""$(i+1)=="2""1") {$i=2;$(i+1)=2} print}' test
A1 A1 0 0 1 1 0 0 0 0 2 2 2 2
A2 A2 0 0 1 1 1 1 1 1 0 0 2 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 2 2

.

{
    $1=$1                 # break the record (for even output)
    for(i=7;i<=NF;i+=2)   # the loop increase by 2s
        if($i""$(i+1)=="1""2" || $i""$(i+1)=="2""1") {
            $i=2;$(i+1)=2 # reset col values if 1,2 OR 2,1
        } 
    print                 # print record, changed or not
}
Sign up to request clarification or add additional context in comments.

1 Comment

That will replace all the tabs in the file with blank characters. The OP doesn't tell us this but I THINK his input format is A1\tA1\t0\t0\t1\t1\t0 0\t0 0\t2 2\t1 2
1

Awk is your friend.

awk -v FS='\t' -v OFS='\t' '{for(i=7;i<=NF;i++) \
 {if($i ~ /^[ 2]*[1]{1}[ 2]*$/){$i="2 2"}}}1'  file

should do it.

Comments

1

It sounds like all you need is:

$ awk '{gsub(/1 2|2 1/,"2 2")}1' file
A1      A1      0       0       1       1       0 0     0 0     2 2     2 2
A2      A2      0       0       1       1       1 1     1 1     0 0     2 2
A3      A3      0       0       1       2       1 1     1 1     0 0     2 2
A4      A4      0       0       1       1       1 1     0 0     0 0     2 2

but your sample input/output REALLY doesn't help demonstrate what your text describes and I don't think your fields are REALLY all tab-separated like you say they are so it's a guess.

Comments

0

From your question it looks like the following pairs of columns are space seperated (7th and 8th),(9th and 10th),(11th and 12th),(13th and 14th). And the others are tab separated. If that is the case you can do it without loops.

awk '{sub("1 2","2 2",$0);sub("2 1","2 2",$0); print;}' <filename>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.