How to replace variables across multiple columns using awk?

Question

I have a file that looks like this with 2060 lines with a header (column names) at the top:

FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1       470502  1       0       0       0       0       0       0       0       0      0       0       0
2       470514  0       0       0       0       0       0       0       0       0       0       0       0
3       470422  0       0       0       0       0       0       0       0       0       0       0       1
4       470510  0       0       0       0       0       1       0       1       1       1       0       1
5       470506  0       0       0       0       0       0       0       0       0       0       0       0
6       471948  0       0       0       0       0       0       0       1       0       0       0       0
7       469922  -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9
8       471220  0       1       1       -9      -9      0       0       1       1       1       0       0
9       470498  0       1       0       0       0       0       0       0       0       0       0       0
10      471993  0       1       1       0       0       0       0       0       0       0       0       0
11      470414  0       1       0       0       0       0       0       0       1       0       0       0
12      470522  0       0       0       0       0       0       0       0       0       0       0       0
13      470345  0       0       0       0       0       0       0       0       0       0       0       0
14      471275  0       1       0       -9      0       0       0       1       0       0       0       0
15      471283  0       1       0       0       0       0       0       1       1       0       0       0
16      472577  0       1       0       0       0       0       0       1       0       0       0       0
17      470492  0       1       0       0       0       0       0       0       0       0       0       0
18      472889  0       0       0       -9      0       0       0       0       0       0       0       0
19      470500  0       1       0       1       0       0       0       0       1       0       0       0
20      470493  0       0       0       0       0       0       0       1       1       0       0       0

I want to replace all the 0 -> 1 and the 1 -> 2 from column 3 to 12. I don't want to replace the -9. I know for a single column the command will be:

awk'
{
if($3==1)$3=2
if($3==0)$3=1
}
1'file

Therefore, for multiple columns is there an easier way to specify a range rather than manually type every column number?

awk'
{
if($3,$4,$5,$6,$7,$8,$9,$10,$11,$12==1)$3,$4,$5,$6,$7,$8,$9,$10,$11,$12=2
if($3,$4,$5,$6,$7,$8,$9,$10,$11,$12==0)$3,$4,$5,$6,$7,$8,$9,$10,$11,$12=1
}
1'file

Thanks in advance

The fourth bird · Accepted Answer · 2022-01-10 08:40:09Z

4

You could use a loop and change the field values accessing the field value using $i

awk '
{
  for(i=3; i<=12; i++) {
    if ($i==1 || $i==0) $i++
  }
}1
' file | column -t

edited Jan 10, 2022 at 8:40

answered Jan 9, 2022 at 13:33

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ed Morton · Accepted Answer · 2022-01-09 15:21:41Z

One possibility if you want to change almost all of your fields (as in your case) is to just save the ones you don't want to change and then change everything else:

$ awk 'NR>1{hd=$1 FS $2; tl=$13 FS $14; $1=$2=$13=$14=""; gsub(1,2); gsub(0,1); $0=hd $0 tl} 1' file
FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1 470502  2 1 1 1 1 1 1 1 1 1  0 0
2 470514  1 1 1 1 1 1 1 1 1 1  0 0
3 470422  1 1 1 1 1 1 1 1 1 1  0 1
4 470510  1 1 1 1 1 2 1 2 2 2  0 1
5 470506  1 1 1 1 1 1 1 1 1 1  0 0
6 471948  1 1 1 1 1 1 1 2 1 1  0 0
7 469922  -9 -9 -9 -9 -9 -9 -9 -9 -9 -9  -9 -9
8 471220  1 2 2 -9 -9 1 1 2 2 2  0 0
9 470498  1 2 1 1 1 1 1 1 1 1  0 0
10 471993  1 2 2 1 1 1 1 1 1 1  0 0
11 470414  1 2 1 1 1 1 1 1 2 1  0 0
12 470522  1 1 1 1 1 1 1 1 1 1  0 0
13 470345  1 1 1 1 1 1 1 1 1 1  0 0
14 471275  1 2 1 -9 1 1 1 2 1 1  0 0
15 471283  1 2 1 1 1 1 1 2 2 1  0 0
16 472577  1 2 1 1 1 1 1 2 1 1  0 0
17 470492  1 2 1 1 1 1 1 1 1 1  0 0
18 472889  1 1 1 -9 1 1 1 1 1 1  0 0
19 470500  1 2 1 2 1 1 1 1 2 1  0 0
20 470493  1 1 1 1 1 1 1 2 2 1  0 0

pipe it to column -t for alignment if you like.

Or using GNU awk for the 3rg arg to match() and retaining white space:

$ awk 'NR>1{ match($0,/((\S+\s+){2})((\S+\s+){9}\S+)(.*)/,a); gsub(1,2,a[3]); gsub(0,1,a[3]); $0=a[1] a[3] a[5] } 1' file
FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1       470502  2       1       1       1       1       1       1       1       1      1       0       0
2       470514  1       1       1       1       1       1       1       1       1       1       0       0
3       470422  1       1       1       1       1       1       1       1       1       1       0       1
4       470510  1       1       1       1       1       2       1       2       2       2       0       1
5       470506  1       1       1       1       1       1       1       1       1       1       0       0
6       471948  1       1       1       1       1       1       1       2       1       1       0       0
7       469922  -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9
8       471220  1       2       2       -9      -9      1       1       2       2       2       0       0
9       470498  1       2       1       1       1       1       1       1       1       1       0       0
10      471993  1       2       2       1       1       1       1       1       1       1       0       0
11      470414  1       2       1       1       1       1       1       1       2       1       0       0
12      470522  1       1       1       1       1       1       1       1       1       1       0       0
13      470345  1       1       1       1       1       1       1       1       1       1       0       0
14      471275  1       2       1       -9      1       1       1       2       1       1       0       0
15      471283  1       2       1       1       1       1       1       2       2       1       0       0
16      472577  1       2       1       1       1       1       1       2       1       1       0       0
17      470492  1       2       1       1       1       1       1       1       1       1       0       0
18      472889  1       1       1       -9      1       1       1       1       1       1       0       0
19      470500  1       2       1       2       1       1       1       1       2       1       0       0
20      470493  1       1       1       1       1       1       1       2       2       1       0       0

dawg · Accepted Answer · 2022-01-09 20:09:14Z

It is hard to tell if that is space delimited or tab delimited?

Here is a ruby that will deal with either space or tab delimited fields and will convert the result to tab delimited.

Note: Ruby arrays are zero based, so fields 1,2 are [0..1] and fields 3-12 are [2..11]

ruby -r csv -e 'options={:col_sep=>"\t", :converters=>:all, :headers=>true}
data=CSV.parse($<.read.gsub(/[[:blank:]]+/,"\t"), **options)
data.each_with_index{
  |r,i| data[i]=r[0..1]+r[2..11].map{|e| (e==1 || e==0) ? e+1 : e}+r[12..]}
puts data.to_csv(**options)
' file

Prints:

FID IID late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1   late_nipple_retraction_G2   late_oedema_G1  late_oedema_G2  late_induration_tumour_G1   late_induration_outside_G1  late_induration_G2  late_arm_lympho_G1  late_hyper_G1
1   470502  2   1   1   1   1   1   1   1   1   1   0   0
2   470514  1   1   1   1   1   1   1   1   1   1   0   0
3   470422  1   1   1   1   1   1   1   1   1   1   0   1
4   470510  1   1   1   1   1   2   1   2   2   2   0   1
5   470506  1   1   1   1   1   1   1   1   1   1   0   0
6   471948  1   1   1   1   1   1   1   2   1   1   0   0
7   469922  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9
8   471220  1   2   2   -9  -9  1   1   2   2   2   0   0
9   470498  1   2   1   1   1   1   1   1   1   1   0   0
10  471993  1   2   2   1   1   1   1   1   1   1   0   0
11  470414  1   2   1   1   1   1   1   1   2   1   0   0
12  470522  1   1   1   1   1   1   1   1   1   1   0   0
13  470345  1   1   1   1   1   1   1   1   1   1   0   0
14  471275  1   2   1   -9  1   1   1   2   1   1   0   0
15  471283  1   2   1   1   1   1   1   2   2   1   0   0
16  472577  1   2   1   1   1   1   1   2   1   1   0   0
17  470492  1   2   1   1   1   1   1   1   1   1   0   0
18  472889  1   1   1   -9  1   1   1   1   1   1   0   0
19  470500  1   2   1   2   1   1   1   1   2   1   0   0
20  470493  1   1   1   1   1   1   1   2   2   1   0   0

With awk you can do:

awk -v OFS="\t" 'FNR>1{for(i=3;i<=12;i++)if ($i~"^[10]$")$i=$i+1} $1=$1' file   

# same output

konsolebox · Accepted Answer · 2022-01-09 15:27:13Z

0

gawk -v RS='[[:space:]]+' '++c > 2 && /^(0|1)$/ { ++$0 }
        { printf "%s", $0 RT } RT ~ /\n/ { c = 0 }' file

edited Jan 9, 2022 at 15:27

answered Jan 9, 2022 at 14:36

konsolebox

76.3k13 gold badges110 silver badges114 bronze badges

Collectives™ on Stack Overflow

How to replace variables across multiple columns using awk?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related