Using Awk to replace strings in one file with strings from other file

Question

I have a tab deliminated, File A, like this

establishment_of_protein_localization_to_endoplasmic_reticulum  GO:0072599
    lipid_oxidation GO:0034440
    endocytic_vesicle_lumen GO:0071682
    monocarboxylic_acid_metabolic_process   GO:0032787
    protein_transmembrane_transport GO:0071806
    cellular_response_to_topologically_incorrect_protein    GO:0035967
    preribosome GO:0030684
    negative_regulation_of_hematopoietic_progenitor_cell_differentiation    GO:1901533

and a second file structure as such:

font-family: Helvetica;
font-size: 10.86px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
GO:0072599
</text>

<text x="509.10" y="-243.88"

style="
font-family: Helvetica;
font-size: 10.72px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
GO:0034440
</text>

and i want to use awk or sed to match the second column of file a to the second file and replace the matching strings with the first column of file in the second file and replace them with the first column. To give this ouput essentially

font-family: Helvetica;
font-size: 10.86px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
 establishment_of_protein_localization_to_endoplasmic_reticulum 
</text>

<text x="509.10" y="-243.88"

style="
font-family: Helvetica;
font-size: 10.72px;
font-weight: 700;
text-anchor: middle;
fill: #000000;
stroke: none;">
lipid_oxidation
</text>

Except the GO:###### Sequences match the column in the first file. I tried using this command

#!/bin/bash

    awk 'NR==FNR{a[$2]=$1;next}{$1=a[$1\2];}1' input.csv

however, it replaces more than just the strings in column 2 of file a

the output is wrong: regulation_of_muscle_system_process GO:0090257 does not relate to GO:0045927. Update your description — RomanPerekhrest
– RomanPerekhrest, Commented Mar 9, 2018 at 6:32
Yeah could give us proper input and output so that we can help you? — Allan
– Allan, Commented Mar 9, 2018 at 6:36
Hi Allan, I just corrected the input and the output to match. I apologize, it was suppose to be symbolic but it should now make more sense — Rnewbie
– Rnewbie, Commented Mar 9, 2018 at 6:39
@Rnewbie, elaborate whether those asterisks **est... really appear in your file — RomanPerekhrest
– RomanPerekhrest, Commented Mar 9, 2018 at 6:41
Whoops, that was my attempt to make the change my clear, they do not - i have fixed that — Rnewbie
– Rnewbie, Commented Mar 9, 2018 at 6:47

Inian · Accepted Answer · 2018-03-09 07:35:16Z

3

The solution you are looking forward to is something like below. But your output does not match your input file

awk 'FNR==NR{ hashKey[$2]=$1; next }$1 in hashKey{$1=hashKey[$1]}1' FS='\t' file1 file2

The idea is we hash the values in the second column of the first file which is tab-separated. Then on the second values for those values in first column which are present in the hash table, we update the value from the stored hash.

edited Mar 9, 2018 at 7:35

answered Mar 9, 2018 at 6:34

Inian

87k15 gold badges166 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Using Awk to replace strings in one file with strings from other file

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related