BASH - Change information in columns 2 by 2 using for loop and If statements

Question

I have the following tab-separated file:

A1    A1    0       0       1       1       0 0     0 0     2 2     1 2
A2    A2    0       0       1       1       1 1     1 1     0 0     1 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     1 2

The idea is to modify the information between column 7 (included) and the end in the way that, for every row, if column 7 and 8:

equal “0 0”: don’t modify
equal “1 1”: don’t modify
equal “1 2” or “2 1”: change to “2 2”
equal “2 2”: don’t modify

And the same for the following columns (9 and 10, then 11 and 12, 13 and 14, and so on..).

I started to extract the columns I want to work on using the command:

awk '{for (i = 7; i <= NF; i++) printf $i " "; print ""}' test.ped > tmp_test.txt

Then I was thinking to use a for loop with If statements, with this general format:

for i between 7 and the end (for (i = 7; i <= NF)):
    if i and i+1 == “1 2”:
        replace by “2 2”
    elif i and i+1 == “2 1”:
        replace by “2 2”
    else
        pass
    i=i+2 (increase i to do the same for the next double columns)

But I am stuck here. Is the general format logical or is there a faster way to do the same? Am I going in the right direction?

The expected output (after merging the first 6 columns from the initial file and the ones that I subsetted and modified) is:

A1    A1    0       0       1       1       0 0     0 0     2 2     2 2
A2    A2    0       0       1       1       1 1     1 1     0 0     2 2
A3    A3    0       0       1       2       1 1     1 1     0 0     2 2
A4    A4    0       0       1       1       1 1     0 0     0 0     2 2

Thank you for your help!

Hmm it looks like the only difference between your input and output is that each line has been changed to end in 2 2. Could you edit your question to explain the condition and the columns to be modified more clearly? — Tom Fenech
– Tom Fenech, Commented Aug 30, 2016 at 8:57
When providing sample input and expected output you should cover all your use cases to demonstrate what you are trying to explain in your text. Right now your sample output makes it look like your problem could be solved by just changing the last 2 fields to 2 2 for every line. You already asked a similar question and the answers you got were resoundingly, and correctly, not bash so why are you back to asking for a bash solution now? — Ed Morton
– Ed Morton, Commented Aug 30, 2016 at 15:27
Am I right in guessing that your input file isn't ALL tab-separated and your first line, for example, is actually A1\tA1\t0\t0\t1\t1\t0 0\t0 0\t2 2\t1 2 with blank chars instead of tabs between the pairs of digits that start at field 7 of your tab-separated file so field 7 is actually 0 0? — Ed Morton
– Ed Morton, Commented Aug 30, 2016 at 15:43

James Brown · Accepted Answer · 2016-08-30 08:55:39Z

1

$ awk '{$1=$1;for(i=7;i<=NF;i+=2) if($i""$(i+1)=="1""2" || $i""$(i+1)=="2""1") {$i=2;$(i+1)=2} print}' test
A1 A1 0 0 1 1 0 0 0 0 2 2 2 2
A2 A2 0 0 1 1 1 1 1 1 0 0 2 2
A3 A3 0 0 1 2 1 1 1 1 0 0 2 2
A4 A4 0 0 1 1 1 1 0 0 0 0 2 2

.

{
    $1=$1                 # break the record (for even output)
    for(i=7;i<=NF;i+=2)   # the loop increase by 2s
        if($i""$(i+1)=="1""2" || $i""$(i+1)=="2""1") {
            $i=2;$(i+1)=2 # reset col values if 1,2 OR 2,1
        } 
    print                 # print record, changed or not
}

answered Aug 30, 2016 at 8:55

James Brown

37.7k8 gold badges52 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ed Morton Over a year ago

That will replace all the tabs in the file with blank characters. The OP doesn't tell us this but I THINK his input format is A1\tA1\t0\t0\t1\t1\t0 0\t0 0\t2 2\t1 2

sjsam · Accepted Answer · 2016-08-30 10:46:01Z

1

Awk is your friend.

awk -v FS='\t' -v OFS='\t' '{for(i=7;i<=NF;i++) \
 {if($i ~ /^[ 2]*[1]{1}[ 2]*$/){$i="2 2"}}}1'  file

should do it.

answered Aug 30, 2016 at 10:46

sjsam

22k6 gold badges62 silver badges114 bronze badges

Comments

Ed Morton · Accepted Answer · 2016-08-30 18:07:45Z

1

It sounds like all you need is:

$ awk '{gsub(/1 2|2 1/,"2 2")}1' file
A1      A1      0       0       1       1       0 0     0 0     2 2     2 2
A2      A2      0       0       1       1       1 1     1 1     0 0     2 2
A3      A3      0       0       1       2       1 1     1 1     0 0     2 2
A4      A4      0       0       1       1       1 1     0 0     0 0     2 2

but your sample input/output REALLY doesn't help demonstrate what your text describes and I don't think your fields are REALLY all tab-separated like you say they are so it's a guess.

edited Aug 30, 2016 at 18:07

answered Aug 30, 2016 at 15:31

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Comments

rakinhaider · Accepted Answer · 2016-08-30 11:19:58Z

0

From your question it looks like the following pairs of columns are space seperated (7th and 8th),(9th and 10th),(11th and 12th),(13th and 14th). And the others are tab separated. If that is the case you can do it without loops.

awk '{sub("1 2","2 2",$0);sub("2 1","2 2",$0); print;}' <filename>

answered Aug 30, 2016 at 11:19

rakinhaider

1247 bronze badges

Collectives™ on Stack Overflow

BASH - Change information in columns 2 by 2 using for loop and If statements

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related