Replace text in one line but avoid certain pattern

Question

I am trying to use sedfor replacing text in one line but only if it is not inside a specific pattern. For instance the line could be

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla

and the result must be (by replacing bla by KUI with unsensitive case)

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla

I am not sure sed is the right command for this. Other classical unix command could be used.

I doubt you can do anything interesting with sed for a language that can have nested curly brackets. (except if you project to use a bad solution that works half a time). A solution with a perl command line is possible. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Sep 5, 2017 at 19:45
@Guuk: the language seems to use a kind of latex syntax that allows tags to be nested. — Casimir et Hippolyte
– Casimir et Hippolyte, Commented Sep 5, 2017 at 20:09
@CasimiretHippolyte You are right but the purpose of my question is to write a code for automatically add a specific latex command for some words and avoiding to apply it on certain case. On some situation, I want to avoid to add nested tags. — Guuk
– Guuk, Commented Sep 5, 2017 at 20:31

Ed Morton · Accepted Answer · 2017-09-05 21:17:34Z

2

sed is for simple s/old/new/, that is all. You aren't simply doing s/old/new/ so you shouldn't be considering sed. Just use awk:

$ cat tst.awk
function descend( internalStr) {
    while( ++i <= length($0) ) {
        char = substr($0,i,1)
        internalStr = internalStr char
        if (char == "{") {
            internalStr = internalStr descend()
        }
        else if (char == "}") {
            return internalStr
        }
    }
}
BEGIN { IGNORECASE=1 }
{
    fullStr = externalStr = ""
    i = 0
    while( ++i <= length($0) ) {
        char = substr($0,i,1)
        externalStr = externalStr char
        if (char == "{") {
            gsub(/\<bla\>/,"KUI",externalStr)
            fullStr = fullStr externalStr descend()
            externalStr = ""
        }
    }
    gsub(/\<bla\>/,"KUI",externalStr)
    print fullStr externalStr
}

.

$ cat file
bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
bla \tag1{ bla \tag2{ bla } bla } bla

$ gawk -f tst.awk file
KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
KUI \tag1{ bla \tag2{ bla } bla } KUI

The above uses GNU awk for word boundaries and IGNORECASE. The need for those can be worked around pretty easily with other awks.

Note that it works even for nested tags (the 2nd input/output line).

answered Sep 5, 2017 at 21:17

Ed Morton

209k18 gold badges90 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Guuk Over a year ago

I am not very familiar with awk but this is a good opportunity for trying it. Thank you! @edmorton

RomanPerekhrest · Accepted Answer · 2017-09-05 21:41:25Z

2

gawk solution for 1-level enclosing brackets {...}:

awk 'BEGIN{ IGNORECASE=1 }
     {   split($0, a, /\{[^{}]+\}/, seps); 
         for(i=1; i in a; i++) { 
             gsub(/\<bla\>/,"KUI",a[i]); 
             printf "%s%s",a[i],seps[i] 
         } 
         print ""  
     }' file

The output:

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla

edited Sep 5, 2017 at 21:41

answered Sep 5, 2017 at 20:52

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

First variant - works with nested braces.

awk -F '' '
    function buf_sub() {
        gsub(/\ybla\y/, "KUI", buffer);
        string = string buffer;
        buffer = "";
    }
    BEGIN {
        IGNORECASE = 1;
    }
    {
        string = "";
        buffer = "";
        for(i = 1; i <= NF; i++) {
            if(cnt) 
                string = string $i; 
             else 
                buffer = buffer $i;
            
            if($i == "{") {
                cnt++;
                buf_sub();
            } 
            if($i == "}") 
                cnt--;  
        }
        buf_sub();
        print string;
    }
' input.txt

Input

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
blab bla blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla Bla
bla \tag1{ bla \tag2{ bla } bla } bla

Output

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
blab KUI blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla KUI
KUI \tag1{ bla \tag2{ bla } bla } KUI

Second variant - doesn't process braces nesting.

sed -r 's/(\\[^}]*})/\n@#\1\n@#/g' input.txt |
sed '/\\/! s/\bbla\b/KUI/gI;' |
sed ':lab; N; $!b lab; s/\n@#//g;'

Input

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
blab bla blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla Bla

Output

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
blab KUI blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla KUI

Guuk · Accepted Answer · 2017-09-10 08:41:26Z

1

Solution using perl:

perl -lpe 's/(\\footcite([^}]*)|\\cite([^}]*))(*SKIP)(*FAIL)|\bbla\b/KUI/ig' file

The \footcite and \cite commands will be avoid in the replacement process.

answered Sep 10, 2017 at 8:41

Guuk

6097 silver badges19 bronze badges

1 Comment

Guuk Over a year ago

This solution is based on the @CasimiretHippolyte proposition.

Neb · Accepted Answer · 2017-09-05 19:55:11Z

0

Do:

sed -e 's/\<[bB]la\>/KUI/g' yourFile

where:

\<bla\>

specifies to search a word that match exactly the string 'bla'. \< is used to specify the beginning of the word. In this case the word must start with b or B. \> specify the ending of the word. In this case it must ends with a. Between 'b' ('B') and 'a', there must be only one 'l'.

Update I noticed that sed do not read well the characters '{' and '}' so it consider a word also {bla} and translates it into {KUI}. A workaround, is the following:

sed -e 's/{/opened/g' yourFile > newFile
sed -e 's/}/closed/g' newFile1 > yourFile

sed -e 's/\<[bB]la\>/KUI/g' yourFile > newFile

sed -e 's/opened/{/g' newFile > yourFile
sed -e 's/closed/}/g' yourFile > newFile

It's not so elegant, but it works.

Hope it helps

edited Sep 5, 2017 at 19:55

answered Sep 5, 2017 at 19:33

Neb

2,2801 gold badge15 silver badges28 bronze badges

2 Comments

Guuk Over a year ago

it is not working with sed on OSX and with GNU sed. The command replaces all occurences

Neb Over a year ago

Obviously, it works if you do not have in your file the strings "opened" and "closed". Anyway, you can choose every string you want to get it works.

Collectives™ on Stack Overflow

Replace text in one line but avoid certain pattern

5 Answers 5

1 Comment

Comments

First variant - works with nested braces.

Second variant - doesn't process braces nesting.

Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

Comments

First variant - works with nested braces.

Second variant - doesn't process braces nesting.

Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related