0

First, I have to say that I tried different solutions posted here but none could solve the problem I got. Sorry if is similar to another, but I cannot make it work. I have a file like this one:

chr17   7579366 COSM45509;COSM11448;COSM45040   G   A,C,T   13.2    PASS    AF=0.0216216,0,0;AO=4,0,0;DP=185;FAO=4,0,0;FDP=185;FDVR=5,5,5;FR=.,.,.,REALIGNEDx0.03243;FRO=181;FSAF=3,0,0;FSAR=1,0,0;FSRF=136;FSRR=45;FUNC=[{'origPos':'7579366','origRef':'G','normalizedRef':'G','gene':'TP53','normalizedPos':'7579366','normalizedAlt':'A','gt':'pos','codon':'TAT','coding':'c.321C>T','transcript':'NM_000546.5','function':'synonymous','protein':'p.(=)','location':'exonic','origAlt':'A','exon':'4','CLNACC1':'RCV000220860','CLNSIG1':'Likely_benign','CLNREVSTAT1':'single','CLNID1':'rs770776262'}];FWDB=0.0168253,-0.0508378,0.0146373;FXX=0;HRUN=1,1,1;HS;HS_ONLY=0;LEN=1,1,1;MLLD=80.6954,128.354,137.413;OALT=A,C,T;OID=COSM45509,COSM11448,COSM45040;OMAPALT=A,C,T;OPOS=7579366,7579366,7579366;OREF=G,G,G;PB=.,.,.;PBP=.,.,.;QD=0.285514;RBI=0.0283383,0.0582477,0.0324614;REFB=0.000208231,-0.000390718,-0.000240087;REVB=0.0228028,-0.0284309,-0.028974;RO=181;SAF=3,0,0;SAR=1,0,0;SRF=136;SRR=45;SSEN=0,0,0;SSEP=0,0,0;SSSB=-0.00109306,0,0;STB=0.501796,0.5,0.5;STBP=0.98,1,1;TYPE=snp,snp,snp;VARB=-0.00120684,0,0;AF_gnomAD=0;cosmic_ids=COSM2745025,COSM5055813,COSM2745026,COSM4272039,COSM5055814,COSM5055812,COSM11448,COSM45509,COSM2745027,COSM45040,COSM4487692,COSM4487691,COSM213589,COSM5055811,COSM213590 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:13:185:185:181:181:4:4:0.0216215997934341:1:3:136:45:1:3:136:45
chr18   48575168    COSM14216;COSM25274 T   TA  41.6    PASS    AF=0.0178891;AO=92;DP=3963;FAO=70;FDP=3913;FDVR=0;FR=.;FRO=3843;FSAF=43;FSAR=27;FSRF=2160;FSRR=1683;FUNC=[{'origPos':'48575168','origRef':'T','normalizedRef':'T','gene':'SMAD4','normalizedPos':'48575168','normalizedAlt':'TA','gt':'pos','codon':'ATG','coding':'c.366_367insA','transcript':'NM_005359.5','function':'frameshiftInsertion','protein':'p.Cys123fs','location':'exonic','origAlt':'TA','exon':'3'}];FWDB=0.0820535;FXX=0.0098684;HRUN=4;HS;HS_ONLY=0;LEN=1;MLLD=27.4213;OALT=A,A;OID=COSM14216,COSM25274;OMAPALT=TA,TA;OPOS=48575170,48575173;OREF=-,-;PB=.;PBP=.;QD=0.0425701;RBI=0.0857718;REFB=-0.00250641;REVB=0.0249804;RO=3838;SAF=50;SAR=42;SRF=2167;SRR=1671;SSEN=0;SSEP=0;SSSB=-0.0142007;STB=0.552796;STBP=0.415;TYPE=ins;VARB=0.109082;AF_gnomAD=0 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:41:3963:3913:3838:3843:92:70:0.0178890991955996:42:50:2167:1671:27:43:2160:1683

chr18   48604749    rs377767375;COSM6784692 G   T   617.0   PASS    AF=0.107822;AO=153;DP=1423;FAO=153;FDP=1419;FDVR=5;FR=.,REALIGNEDx0.1326;FRO=1266;FSAF=143;FSAR=10;FSRF=1076;FSRR=190;FUNC=[{'origPos':'48604749','origRef':'G','normalizedRef':'G','gene':'SMAD4','normalizedPos':'48604749','normalizedAlt':'T','polyphen':'0.834','gt':'pos','codon':'TTG','coding':'c.1571G>T','sift':'0.0','grantham':'61.0','transcript':'NM_005359.5','function':'missense','protein':'p.Trp524Leu','location':'exonic','origAlt':'T','exon':'12','CLNACC1':'RCV000021747','CLNSIG1':'Pathogenic','CLNREVSTAT1':'no_criteria','CLNID1':'rs377767375'}];FWDB=0.0098166;FXX=0.0042105;HRUN=2;HS_ONLY=0;LEN=1;MLLD=57.8823;OALT=T;OID=.;OMAPALT=T;OPOS=48604749;OREF=G;PB=.;PBP=.;QD=1.73921;RBI=0.0271184;REFB=-0.00145864;REVB=-0.0252793;RO=1267;SAF=143;SAR=10;SRF=1078;SRR=189;SSEN=0;SSEP=0;SSSB=0.257732;STB=0.701118;STBP=0.001;TYPE=snp;VARB=0.0118254;AF_gnomAD=0;rs_ids=rs377767375;cosmic_ids=COSM6784692   GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:616:1423:1419:1267:1266:153:153:0.107822000980377:10:143:1078:189:10:143:1076:190

I get the two variables that I want to change using:

A=$(cat file | awk -F '\t' '{print$8}' | awk -F ';' '{print$1}' | awk -F '=' '{print$2}')
B=$(cat file | grep -v "#" | awk -F '\t' '{print$10}' | awk -F ":" '{print$9}'

My objective here is to change each value from the variable $A with the corresponding value of the variable $B. I tried to perform sed -i directly but throws me the error related to spaces inside the variabñe: sed: -e expression: unterminated s' command. I tried to use a for loop but I cannot manage to get rid of the error. My last try was something like:

length=3
for ((i=0;i<=$length;i++)); do sed -i "s/"${key[$i]}"/"${value[$i]}"/g" file ; done

Any ideas? Thanks!

EDIT: The output should change the value(s) after AF= by the values from $B which correspond to the 9th value in the last chunck of numbers separated by ":". In this example:

chr17   7579366 COSM45509;COSM11448;COSM45040   G   A,C,T   13.2    PASS    AF=0.0216215997934341;AO=4,0,0;DP=185;FAO=4,0,0;FDP=185;FDVR=5,5,5;FR=.,.,.,REALIGNEDx0.03243;FRO=181;FSAF=3,0,0;FSAR=1,0,0;FSRF=136;FSRR=45;FUNC=[{'origPos':'7579366','origRef':'G','normalizedRef':'G','gene':'TP53','normalizedPos':'7579366','normalizedAlt':'A','gt':'pos','codon':'TAT','coding':'c.321C>T','transcript':'NM_000546.5','function':'synonymous','protein':'p.(=)','location':'exonic','origAlt':'A','exon':'4','CLNACC1':'RCV000220860','CLNSIG1':'Likely_benign','CLNREVSTAT1':'single','CLNID1':'rs770776262'}];FWDB=0.0168253,-0.0508378,0.0146373;FXX=0;HRUN=1,1,1;HS;HS_ONLY=0;LEN=1,1,1;MLLD=80.6954,128.354,137.413;OALT=A,C,T;OID=COSM45509,COSM11448,COSM45040;OMAPALT=A,C,T;OPOS=7579366,7579366,7579366;OREF=G,G,G;PB=.,.,.;PBP=.,.,.;QD=0.285514;RBI=0.0283383,0.0582477,0.0324614;REFB=0.000208231,-0.000390718,-0.000240087;REVB=0.0228028,-0.0284309,-0.028974;RO=181;SAF=3,0,0;SAR=1,0,0;SRF=136;SRR=45;SSEN=0,0,0;SSEP=0,0,0;SSSB=-0.00109306,0,0;STB=0.501796,0.5,0.5;STBP=0.98,1,1;TYPE=snp,snp,snp;VARB=-0.00120684,0,0;AF_gnomAD=0;cosmic_ids=COSM2745025,COSM5055813,COSM2745026,COSM4272039,COSM5055814,COSM5055812,COSM11448,COSM45509,COSM2745027,COSM45040,COSM4487692,COSM4487691,COSM213589,COSM5055811,COSM213590    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:13:185:185:181:181:4:4:0.0216215997934341:1:3:136:45:1:3:136:45
chr18   48575168    COSM14216;COSM25274 T   TA  41.6    PASS    AF=0.0178890991955996;AO=92;DP=3963;FAO=70;FDP=3913;FDVR=0;FR=.;FRO=3843;FSAF=43;FSAR=27;FSRF=2160;FSRR=1683;FUNC=[{'origPos':'48575168','origRef':'T','normalizedRef':'T','gene':'SMAD4','normalizedPos':'48575168','normalizedAlt':'TA','gt':'pos','codon':'ATG','coding':'c.366_367insA','transcript':'NM_005359.5','function':'frameshiftInsertion','protein':'p.Cys123fs','location':'exonic','origAlt':'TA','exon':'3'}];FWDB=0.0820535;FXX=0.0098684;HRUN=4;HS;HS_ONLY=0;LEN=1;MLLD=27.4213;OALT=A,A;OID=COSM14216,COSM25274;OMAPALT=TA,TA;OPOS=48575170,48575173;OREF=-,-;PB=.;PBP=.;QD=0.0425701;RBI=0.0857718;REFB=-0.00250641;REVB=0.0249804;RO=3838;SAF=50;SAR=42;SRF=2167;SRR=1671;SSEN=0;SSEP=0;SSSB=-0.0142007;STB=0.552796;STBP=0.415;TYPE=ins;VARB=0.109082;AF_gnomAD=0    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:41:3963:3913:3838:3843:92:70:0.0178890991955996:42:50:2167:1671:27:43:2160:1683

chr18   48604749    rs377767375;COSM6784692 G   T   617.0   PASS    AF=0.107822000980377;AO=153;DP=1423;FAO=153;FDP=1419;FDVR=5;FR=.,REALIGNEDx0.1326;FRO=1266;FSAF=143;FSAR=10;FSRF=1076;FSRR=190;FUNC=[{'origPos':'48604749','origRef':'G','normalizedRef':'G','gene':'SMAD4','normalizedPos':'48604749','normalizedAlt':'T','polyphen':'0.834','gt':'pos','codon':'TTG','coding':'c.1571G>T','sift':'0.0','grantham':'61.0','transcript':'NM_005359.5','function':'missense','protein':'p.Trp524Leu','location':'exonic','origAlt':'T','exon':'12','CLNACC1':'RCV000021747','CLNSIG1':'Pathogenic','CLNREVSTAT1':'no_criteria','CLNID1':'rs377767375'}];FWDB=0.0098166;FXX=0.0042105;HRUN=2;HS_ONLY=0;LEN=1;MLLD=57.8823;OALT=T;OID=.;OMAPALT=T;OPOS=48604749;OREF=G;PB=.;PBP=.;QD=1.73921;RBI=0.0271184;REFB=-0.00145864;REVB=-0.0252793;RO=1267;SAF=143;SAR=10;SRF=1078;SRR=189;SSEN=0;SSEP=0;SSSB=0.257732;STB=0.701118;STBP=0.001;TYPE=snp;VARB=0.0118254;AF_gnomAD=0;rs_ids=rs377767375;cosmic_ids=COSM6784692  GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:616:1423:1419:1267:1266:153:153:0.107822000980377:10:143:1078:189:10:143:1076:190
3
  • Good that you have showed us the code what you have tried, please do show us what should be expected output here too, with complete details too as it is not clear. Commented Jul 27, 2018 at 7:30
  • I added the desired output. Hope now is clearer @RavinderSingh13 :) Commented Jul 27, 2018 at 7:39
  • Just as a second approach, I tried to use nested for loops as follows: for i in `cat file | awk -F '\t' '{print$8}' | awk -F ';' '{print$1}' | awk -F '=' '{print$2}'`; do for j in `cat file | grep -v "#" | awk -F '\t' '{print$10}' | awk -F ":" '{print$9}'`; do sed -i "s/$j/$i/g" file; done ; done In this case the command changes the desired values (equivalent to $A variable) but only using the last value from the second for loop (that is the same as $B from the previous example). I guess that here I should keep track of the position but I am not sure how to do it. Commented Jul 27, 2018 at 8:33

1 Answer 1

2

You may directly use awk to do that without using for loop or other extra commands,

awk -F'[:| ]' '{a=$(NF-8); sub(/=[^;]*/,"="); sub(/^[^=]*=/,"&"a)}1' file

Brief explanation,

  • -F'[:| ]': set ':' and space as the field separator.
  • a=$(NF-8): extract the desired field for substitution, and assign the value to 'a'
  • sub(/=[^;]*/,"="): filter out the value between first '=' and ';'
  • sub(/(^[^=]*=)/,"&"a): assign the value of a behind '='
Sign up to request clarification or add additional context in comments.

1 Comment

The ( and ) in the 2nd sub() aren't doing anything, you can just remove them. +1 for having the patience to try to understand the data in the question!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.