I'm trying to replace substrings in a text file [corpus.txt] with some other substrings using sed. I have the list of possible substrings in a file sub.txt containing the following:
dogs chase
birds eat
chase birds
chase cat
chase birds .
and a corpus.txt containing some texts as below:
dogs chase cats around
dogs bark
cats meow
dogs chase birds
cats chase birds , birds eat grains
dogs chase the cats
the birds chirp
with the desired output
<bop> dogs chase <eop> cats around
dogs bark
cats meow
<bop> dogs chase <eop> birds
cats <bop> chase birds <eop> , <bop> birds eat <eop> grains
<bop> dogs chase <eop> the cats
the birds chirp
Using the Command sed -f <(sed 's/.*/s|\\b&\\b|<bop> & <eop>|g/' sub.txt) corpus.txt it returns everything in the desired output correctly, except in the fifth line where it returns :
cats <bop> <bop> chase birds . <eop>eop> , <bop> birds eat <eop> grains
What can I do to get this to work?
chase birds. Perhaps pass it fromuniqto eliminate duplicates.chase birdsandchase birds .chase birds .matches,chase birdsmatches as well. And the first one will match any char due to.being a special char. So both matches takes place. If you want a literal match, escape.with \ in your sub.txt filechase birds .matches,chase birdsmatches as well. In the worst case I expect it to match justchase birds..match you have to escape it.