2

I have a file that looks like this with 2060 lines with a header (column names) at the top:

FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1       470502  1       0       0       0       0       0       0       0       0      0       0       0
2       470514  0       0       0       0       0       0       0       0       0       0       0       0
3       470422  0       0       0       0       0       0       0       0       0       0       0       1
4       470510  0       0       0       0       0       1       0       1       1       1       0       1
5       470506  0       0       0       0       0       0       0       0       0       0       0       0
6       471948  0       0       0       0       0       0       0       1       0       0       0       0
7       469922  -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9
8       471220  0       1       1       -9      -9      0       0       1       1       1       0       0
9       470498  0       1       0       0       0       0       0       0       0       0       0       0
10      471993  0       1       1       0       0       0       0       0       0       0       0       0
11      470414  0       1       0       0       0       0       0       0       1       0       0       0
12      470522  0       0       0       0       0       0       0       0       0       0       0       0
13      470345  0       0       0       0       0       0       0       0       0       0       0       0
14      471275  0       1       0       -9      0       0       0       1       0       0       0       0
15      471283  0       1       0       0       0       0       0       1       1       0       0       0
16      472577  0       1       0       0       0       0       0       1       0       0       0       0
17      470492  0       1       0       0       0       0       0       0       0       0       0       0
18      472889  0       0       0       -9      0       0       0       0       0       0       0       0
19      470500  0       1       0       1       0       0       0       0       1       0       0       0
20      470493  0       0       0       0       0       0       0       1       1       0       0       0

I want to replace all the 0 -> 1 and the 1 -> 2 from column 3 to 12. I don't want to replace the -9. I know for a single column the command will be:

awk'
{
if($3==1)$3=2
if($3==0)$3=1
}
1'file

Therefore, for multiple columns is there an easier way to specify a range rather than manually type every column number?

awk'
{
if($3,$4,$5,$6,$7,$8,$9,$10,$11,$12==1)$3,$4,$5,$6,$7,$8,$9,$10,$11,$12=2
if($3,$4,$5,$6,$7,$8,$9,$10,$11,$12==0)$3,$4,$5,$6,$7,$8,$9,$10,$11,$12=1
}
1'file

Thanks in advance

4 Answers 4

4

You could use a loop and change the field values accessing the field value using $i

awk '
{
  for(i=3; i<=12; i++) {
    if ($i==1 || $i==0) $i++
  }
}1
' file | column -t
Sign up to request clarification or add additional context in comments.

Comments

1

One possibility if you want to change almost all of your fields (as in your case) is to just save the ones you don't want to change and then change everything else:

$ awk 'NR>1{hd=$1 FS $2; tl=$13 FS $14; $1=$2=$13=$14=""; gsub(1,2); gsub(0,1); $0=hd $0 tl} 1' file
FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1 470502  2 1 1 1 1 1 1 1 1 1  0 0
2 470514  1 1 1 1 1 1 1 1 1 1  0 0
3 470422  1 1 1 1 1 1 1 1 1 1  0 1
4 470510  1 1 1 1 1 2 1 2 2 2  0 1
5 470506  1 1 1 1 1 1 1 1 1 1  0 0
6 471948  1 1 1 1 1 1 1 2 1 1  0 0
7 469922  -9 -9 -9 -9 -9 -9 -9 -9 -9 -9  -9 -9
8 471220  1 2 2 -9 -9 1 1 2 2 2  0 0
9 470498  1 2 1 1 1 1 1 1 1 1  0 0
10 471993  1 2 2 1 1 1 1 1 1 1  0 0
11 470414  1 2 1 1 1 1 1 1 2 1  0 0
12 470522  1 1 1 1 1 1 1 1 1 1  0 0
13 470345  1 1 1 1 1 1 1 1 1 1  0 0
14 471275  1 2 1 -9 1 1 1 2 1 1  0 0
15 471283  1 2 1 1 1 1 1 2 2 1  0 0
16 472577  1 2 1 1 1 1 1 2 1 1  0 0
17 470492  1 2 1 1 1 1 1 1 1 1  0 0
18 472889  1 1 1 -9 1 1 1 1 1 1  0 0
19 470500  1 2 1 2 1 1 1 1 2 1  0 0
20 470493  1 1 1 1 1 1 1 2 2 1  0 0

pipe it to column -t for alignment if you like.

Or using GNU awk for the 3rg arg to match() and retaining white space:

$ awk 'NR>1{ match($0,/((\S+\s+){2})((\S+\s+){9}\S+)(.*)/,a); gsub(1,2,a[3]); gsub(0,1,a[3]); $0=a[1] a[3] a[5] } 1' file
FID     IID     late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1       late_nipple_retraction_G2       late_oedema_G1  late_oedema_G2  late_induration_tumour_G1       late_induration_outside_G1      late_induration_G2      late_arm_lympho_G1 late_hyper_G1
1       470502  2       1       1       1       1       1       1       1       1      1       0       0
2       470514  1       1       1       1       1       1       1       1       1       1       0       0
3       470422  1       1       1       1       1       1       1       1       1       1       0       1
4       470510  1       1       1       1       1       2       1       2       2       2       0       1
5       470506  1       1       1       1       1       1       1       1       1       1       0       0
6       471948  1       1       1       1       1       1       1       2       1       1       0       0
7       469922  -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9      -9
8       471220  1       2       2       -9      -9      1       1       2       2       2       0       0
9       470498  1       2       1       1       1       1       1       1       1       1       0       0
10      471993  1       2       2       1       1       1       1       1       1       1       0       0
11      470414  1       2       1       1       1       1       1       1       2       1       0       0
12      470522  1       1       1       1       1       1       1       1       1       1       0       0
13      470345  1       1       1       1       1       1       1       1       1       1       0       0
14      471275  1       2       1       -9      1       1       1       2       1       1       0       0
15      471283  1       2       1       1       1       1       1       2       2       1       0       0
16      472577  1       2       1       1       1       1       1       2       1       1       0       0
17      470492  1       2       1       1       1       1       1       1       1       1       0       0
18      472889  1       1       1       -9      1       1       1       1       1       1       0       0
19      470500  1       2       1       2       1       1       1       1       2       1       0       0
20      470493  1       1       1       1       1       1       1       2       2       1       0       0

Comments

1

It is hard to tell if that is space delimited or tab delimited?

Here is a ruby that will deal with either space or tab delimited fields and will convert the result to tab delimited.

Note: Ruby arrays are zero based, so fields 1,2 are [0..1] and fields 3-12 are [2..11]

ruby -r csv -e 'options={:col_sep=>"\t", :converters=>:all, :headers=>true}
data=CSV.parse($<.read.gsub(/[[:blank:]]+/,"\t"), **options)
data.each_with_index{
  |r,i| data[i]=r[0..1]+r[2..11].map{|e| (e==1 || e==0) ? e+1 : e}+r[12..]}
puts data.to_csv(**options)
' file

Prints:

FID IID late_telangiectasia_G1  late_atrophy_G1 late_atrophy_G2 late_nipple_retraction_G1   late_nipple_retraction_G2   late_oedema_G1  late_oedema_G2  late_induration_tumour_G1   late_induration_outside_G1  late_induration_G2  late_arm_lympho_G1  late_hyper_G1
1   470502  2   1   1   1   1   1   1   1   1   1   0   0
2   470514  1   1   1   1   1   1   1   1   1   1   0   0
3   470422  1   1   1   1   1   1   1   1   1   1   0   1
4   470510  1   1   1   1   1   2   1   2   2   2   0   1
5   470506  1   1   1   1   1   1   1   1   1   1   0   0
6   471948  1   1   1   1   1   1   1   2   1   1   0   0
7   469922  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9  -9
8   471220  1   2   2   -9  -9  1   1   2   2   2   0   0
9   470498  1   2   1   1   1   1   1   1   1   1   0   0
10  471993  1   2   2   1   1   1   1   1   1   1   0   0
11  470414  1   2   1   1   1   1   1   1   2   1   0   0
12  470522  1   1   1   1   1   1   1   1   1   1   0   0
13  470345  1   1   1   1   1   1   1   1   1   1   0   0
14  471275  1   2   1   -9  1   1   1   2   1   1   0   0
15  471283  1   2   1   1   1   1   1   2   2   1   0   0
16  472577  1   2   1   1   1   1   1   2   1   1   0   0
17  470492  1   2   1   1   1   1   1   1   1   1   0   0
18  472889  1   1   1   -9  1   1   1   1   1   1   0   0
19  470500  1   2   1   2   1   1   1   1   2   1   0   0
20  470493  1   1   1   1   1   1   1   2   2   1   0   0

With awk you can do:

awk -v OFS="\t" 'FNR>1{for(i=3;i<=12;i++)if ($i~"^[10]$")$i=$i+1} $1=$1' file   

# same output

Comments

0
gawk -v RS='[[:space:]]+' '++c > 2 && /^(0|1)$/ { ++$0 }
        { printf "%s", $0 RT } RT ~ /\n/ { c = 0 }' file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.