Changing values with sed using two variables with multiple values separated with space

Question

First, I have to say that I tried different solutions posted here but none could solve the problem I got. Sorry if is similar to another, but I cannot make it work. I have a file like this one:

chr17   7579366 COSM45509;COSM11448;COSM45040   G   A,C,T   13.2    PASS    AF=0.0216216,0,0;AO=4,0,0;DP=185;FAO=4,0,0;FDP=185;FDVR=5,5,5;FR=.,.,.,REALIGNEDx0.03243;FRO=181;FSAF=3,0,0;FSAR=1,0,0;FSRF=136;FSRR=45;FUNC=[{'origPos':'7579366','origRef':'G','normalizedRef':'G','gene':'TP53','normalizedPos':'7579366','normalizedAlt':'A','gt':'pos','codon':'TAT','coding':'c.321C>T','transcript':'NM_000546.5','function':'synonymous','protein':'p.(=)','location':'exonic','origAlt':'A','exon':'4','CLNACC1':'RCV000220860','CLNSIG1':'Likely_benign','CLNREVSTAT1':'single','CLNID1':'rs770776262'}];FWDB=0.0168253,-0.0508378,0.0146373;FXX=0;HRUN=1,1,1;HS;HS_ONLY=0;LEN=1,1,1;MLLD=80.6954,128.354,137.413;OALT=A,C,T;OID=COSM45509,COSM11448,COSM45040;OMAPALT=A,C,T;OPOS=7579366,7579366,7579366;OREF=G,G,G;PB=.,.,.;PBP=.,.,.;QD=0.285514;RBI=0.0283383,0.0582477,0.0324614;REFB=0.000208231,-0.000390718,-0.000240087;REVB=0.0228028,-0.0284309,-0.028974;RO=181;SAF=3,0,0;SAR=1,0,0;SRF=136;SRR=45;SSEN=0,0,0;SSEP=0,0,0;SSSB=-0.00109306,0,0;STB=0.501796,0.5,0.5;STBP=0.98,1,1;TYPE=snp,snp,snp;VARB=-0.00120684,0,0;AF_gnomAD=0;cosmic_ids=COSM2745025,COSM5055813,COSM2745026,COSM4272039,COSM5055814,COSM5055812,COSM11448,COSM45509,COSM2745027,COSM45040,COSM4487692,COSM4487691,COSM213589,COSM5055811,COSM213590 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:13:185:185:181:181:4:4:0.0216215997934341:1:3:136:45:1:3:136:45
chr18   48575168    COSM14216;COSM25274 T   TA  41.6    PASS    AF=0.0178891;AO=92;DP=3963;FAO=70;FDP=3913;FDVR=0;FR=.;FRO=3843;FSAF=43;FSAR=27;FSRF=2160;FSRR=1683;FUNC=[{'origPos':'48575168','origRef':'T','normalizedRef':'T','gene':'SMAD4','normalizedPos':'48575168','normalizedAlt':'TA','gt':'pos','codon':'ATG','coding':'c.366_367insA','transcript':'NM_005359.5','function':'frameshiftInsertion','protein':'p.Cys123fs','location':'exonic','origAlt':'TA','exon':'3'}];FWDB=0.0820535;FXX=0.0098684;HRUN=4;HS;HS_ONLY=0;LEN=1;MLLD=27.4213;OALT=A,A;OID=COSM14216,COSM25274;OMAPALT=TA,TA;OPOS=48575170,48575173;OREF=-,-;PB=.;PBP=.;QD=0.0425701;RBI=0.0857718;REFB=-0.00250641;REVB=0.0249804;RO=3838;SAF=50;SAR=42;SRF=2167;SRR=1671;SSEN=0;SSEP=0;SSSB=-0.0142007;STB=0.552796;STBP=0.415;TYPE=ins;VARB=0.109082;AF_gnomAD=0 GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:41:3963:3913:3838:3843:92:70:0.0178890991955996:42:50:2167:1671:27:43:2160:1683

chr18   48604749    rs377767375;COSM6784692 G   T   617.0   PASS    AF=0.107822;AO=153;DP=1423;FAO=153;FDP=1419;FDVR=5;FR=.,REALIGNEDx0.1326;FRO=1266;FSAF=143;FSAR=10;FSRF=1076;FSRR=190;FUNC=[{'origPos':'48604749','origRef':'G','normalizedRef':'G','gene':'SMAD4','normalizedPos':'48604749','normalizedAlt':'T','polyphen':'0.834','gt':'pos','codon':'TTG','coding':'c.1571G>T','sift':'0.0','grantham':'61.0','transcript':'NM_005359.5','function':'missense','protein':'p.Trp524Leu','location':'exonic','origAlt':'T','exon':'12','CLNACC1':'RCV000021747','CLNSIG1':'Pathogenic','CLNREVSTAT1':'no_criteria','CLNID1':'rs377767375'}];FWDB=0.0098166;FXX=0.0042105;HRUN=2;HS_ONLY=0;LEN=1;MLLD=57.8823;OALT=T;OID=.;OMAPALT=T;OPOS=48604749;OREF=G;PB=.;PBP=.;QD=1.73921;RBI=0.0271184;REFB=-0.00145864;REVB=-0.0252793;RO=1267;SAF=143;SAR=10;SRF=1078;SRR=189;SSEN=0;SSEP=0;SSSB=0.257732;STB=0.701118;STBP=0.001;TYPE=snp;VARB=0.0118254;AF_gnomAD=0;rs_ids=rs377767375;cosmic_ids=COSM6784692   GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:616:1423:1419:1267:1266:153:153:0.107822000980377:10:143:1078:189:10:143:1076:190

I get the two variables that I want to change using:

A=$(cat file | awk -F '\t' '{print$8}' | awk -F ';' '{print$1}' | awk -F '=' '{print$2}')
B=$(cat file | grep -v "#" | awk -F '\t' '{print$10}' | awk -F ":" '{print$9}'

My objective here is to change each value from the variable $A with the corresponding value of the variable $B. I tried to perform sed -i directly but throws me the error related to spaces inside the variabñe: sed: -e expression: unterminated s' command. I tried to use a for loop but I cannot manage to get rid of the error. My last try was something like:

length=3
for ((i=0;i<=$length;i++)); do sed -i "s/"${key[$i]}"/"${value[$i]}"/g" file ; done

Any ideas? Thanks!

EDIT: The output should change the value(s) after AF= by the values from $B which correspond to the 9th value in the last chunck of numbers separated by ":". In this example:

chr17   7579366 COSM45509;COSM11448;COSM45040   G   A,C,T   13.2    PASS    AF=0.0216215997934341;AO=4,0,0;DP=185;FAO=4,0,0;FDP=185;FDVR=5,5,5;FR=.,.,.,REALIGNEDx0.03243;FRO=181;FSAF=3,0,0;FSAR=1,0,0;FSRF=136;FSRR=45;FUNC=[{'origPos':'7579366','origRef':'G','normalizedRef':'G','gene':'TP53','normalizedPos':'7579366','normalizedAlt':'A','gt':'pos','codon':'TAT','coding':'c.321C>T','transcript':'NM_000546.5','function':'synonymous','protein':'p.(=)','location':'exonic','origAlt':'A','exon':'4','CLNACC1':'RCV000220860','CLNSIG1':'Likely_benign','CLNREVSTAT1':'single','CLNID1':'rs770776262'}];FWDB=0.0168253,-0.0508378,0.0146373;FXX=0;HRUN=1,1,1;HS;HS_ONLY=0;LEN=1,1,1;MLLD=80.6954,128.354,137.413;OALT=A,C,T;OID=COSM45509,COSM11448,COSM45040;OMAPALT=A,C,T;OPOS=7579366,7579366,7579366;OREF=G,G,G;PB=.,.,.;PBP=.,.,.;QD=0.285514;RBI=0.0283383,0.0582477,0.0324614;REFB=0.000208231,-0.000390718,-0.000240087;REVB=0.0228028,-0.0284309,-0.028974;RO=181;SAF=3,0,0;SAR=1,0,0;SRF=136;SRR=45;SSEN=0,0,0;SSEP=0,0,0;SSSB=-0.00109306,0,0;STB=0.501796,0.5,0.5;STBP=0.98,1,1;TYPE=snp,snp,snp;VARB=-0.00120684,0,0;AF_gnomAD=0;cosmic_ids=COSM2745025,COSM5055813,COSM2745026,COSM4272039,COSM5055814,COSM5055812,COSM11448,COSM45509,COSM2745027,COSM45040,COSM4487692,COSM4487691,COSM213589,COSM5055811,COSM213590    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:13:185:185:181:181:4:4:0.0216215997934341:1:3:136:45:1:3:136:45
chr18   48575168    COSM14216;COSM25274 T   TA  41.6    PASS    AF=0.0178890991955996;AO=92;DP=3963;FAO=70;FDP=3913;FDVR=0;FR=.;FRO=3843;FSAF=43;FSAR=27;FSRF=2160;FSRR=1683;FUNC=[{'origPos':'48575168','origRef':'T','normalizedRef':'T','gene':'SMAD4','normalizedPos':'48575168','normalizedAlt':'TA','gt':'pos','codon':'ATG','coding':'c.366_367insA','transcript':'NM_005359.5','function':'frameshiftInsertion','protein':'p.Cys123fs','location':'exonic','origAlt':'TA','exon':'3'}];FWDB=0.0820535;FXX=0.0098684;HRUN=4;HS;HS_ONLY=0;LEN=1;MLLD=27.4213;OALT=A,A;OID=COSM14216,COSM25274;OMAPALT=TA,TA;OPOS=48575170,48575173;OREF=-,-;PB=.;PBP=.;QD=0.0425701;RBI=0.0857718;REFB=-0.00250641;REVB=0.0249804;RO=3838;SAF=50;SAR=42;SRF=2167;SRR=1671;SSEN=0;SSEP=0;SSSB=-0.0142007;STB=0.552796;STBP=0.415;TYPE=ins;VARB=0.109082;AF_gnomAD=0    GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:41:3963:3913:3838:3843:92:70:0.0178890991955996:42:50:2167:1671:27:43:2160:1683

chr18   48604749    rs377767375;COSM6784692 G   T   617.0   PASS    AF=0.107822000980377;AO=153;DP=1423;FAO=153;FDP=1419;FDVR=5;FR=.,REALIGNEDx0.1326;FRO=1266;FSAF=143;FSAR=10;FSRF=1076;FSRR=190;FUNC=[{'origPos':'48604749','origRef':'G','normalizedRef':'G','gene':'SMAD4','normalizedPos':'48604749','normalizedAlt':'T','polyphen':'0.834','gt':'pos','codon':'TTG','coding':'c.1571G>T','sift':'0.0','grantham':'61.0','transcript':'NM_005359.5','function':'missense','protein':'p.Trp524Leu','location':'exonic','origAlt':'T','exon':'12','CLNACC1':'RCV000021747','CLNSIG1':'Pathogenic','CLNREVSTAT1':'no_criteria','CLNID1':'rs377767375'}];FWDB=0.0098166;FXX=0.0042105;HRUN=2;HS_ONLY=0;LEN=1;MLLD=57.8823;OALT=T;OID=.;OMAPALT=T;OPOS=48604749;OREF=G;PB=.;PBP=.;QD=1.73921;RBI=0.0271184;REFB=-0.00145864;REVB=-0.0252793;RO=1267;SAF=143;SAR=10;SRF=1078;SRR=189;SSEN=0;SSEP=0;SSSB=0.257732;STB=0.701118;STBP=0.001;TYPE=snp;VARB=0.0118254;AF_gnomAD=0;rs_ids=rs377767375;cosmic_ids=COSM6784692  GT:GQ:DP:FDP:RO:FRO:AO:FAO:AF:SAR:SAF:SRF:SRR:FSAR:FSAF:FSRF:FSRR   0/1:616:1423:1419:1267:1266:153:153:0.107822000980377:10:143:1078:189:10:143:1076:190

Good that you have showed us the code what you have tried, please do show us what should be expected output here too, with complete details too as it is not clear. — RavinderSingh13
– RavinderSingh13, Commented Jul 27, 2018 at 7:30
I added the desired output. Hope now is clearer @RavinderSingh13 :) — Tato14
– Tato14, Commented Jul 27, 2018 at 7:39
Just as a second approach, I tried to use nested for loops as follows: for i in `cat file | awk -F '\t' '{print$8}' | awk -F ';' '{print$1}' | awk -F '=' '{print$2}'`; do for j in `cat file | grep -v "#" | awk -F '\t' '{print$10}' | awk -F ":" '{print$9}'`; do sed -i "s/$j/$i/g" file; done ; done In this case the command changes the desired values (equivalent to $A variable) but only using the last value from the second for loop (that is the same as $B from the previous example). I guess that here I should keep track of the position but I am not sure how to do it. — Tato14
– Tato14, Commented Jul 27, 2018 at 8:33

CWLiu · Accepted Answer · 2018-07-27 13:17:16Z

2

You may directly use awk to do that without using for loop or other extra commands,

awk -F'[:| ]' '{a=$(NF-8); sub(/=[^;]*/,"="); sub(/^[^=]*=/,"&"a)}1' file

Brief explanation,

-F'[:| ]': set ':' and space as the field separator.
a=$(NF-8): extract the desired field for substitution, and assign the value to 'a'
sub(/=[^;]*/,"="): filter out the value between first '=' and ';'
sub(/(^[^=]*=)/,"&"a): assign the value of a behind '='

edited Jul 27, 2018 at 13:17

answered Jul 27, 2018 at 9:23

CWLiu

4,0531 gold badge13 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ed Morton Over a year ago

The ( and ) in the 2nd sub() aren't doing anything, you can just remove them. +1 for having the patience to try to understand the data in the question!

Collectives™ on Stack Overflow

Changing values with sed using two variables with multiple values separated with space

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related