1

I am trying to use sedfor replacing text in one line but only if it is not inside a specific pattern. For instance the line could be

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla

and the result must be (by replacing bla by KUI with unsensitive case)

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla

I am not sure sed is the right command for this. Other classical unix command could be used.

6
  • 2
    I doubt you can do anything interesting with sed for a language that can have nested curly brackets. (except if you project to use a bad solution that works half a time). A solution with a perl command line is possible. Commented Sep 5, 2017 at 19:45
  • @CasimiretHippolyte Do you have any idea of the syntax ? Commented Sep 5, 2017 at 19:51
  • You might try with Perl. Like this. Commented Sep 5, 2017 at 19:53
  • @Guuk: the language seems to use a kind of latex syntax that allows tags to be nested. Commented Sep 5, 2017 at 20:09
  • @CasimiretHippolyte You are right but the purpose of my question is to write a code for automatically add a specific latex command for some words and avoiding to apply it on certain case. On some situation, I want to avoid to add nested tags. Commented Sep 5, 2017 at 20:31

5 Answers 5

2

sed is for simple s/old/new/, that is all. You aren't simply doing s/old/new/ so you shouldn't be considering sed. Just use awk:

$ cat tst.awk
function descend( internalStr) {
    while( ++i <= length($0) ) {
        char = substr($0,i,1)
        internalStr = internalStr char
        if (char == "{") {
            internalStr = internalStr descend()
        }
        else if (char == "}") {
            return internalStr
        }
    }
}
BEGIN { IGNORECASE=1 }
{
    fullStr = externalStr = ""
    i = 0
    while( ++i <= length($0) ) {
        char = substr($0,i,1)
        externalStr = externalStr char
        if (char == "{") {
            gsub(/\<bla\>/,"KUI",externalStr)
            fullStr = fullStr externalStr descend()
            externalStr = ""
        }
    }
    gsub(/\<bla\>/,"KUI",externalStr)
    print fullStr externalStr
}

.

$ cat file
bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
bla \tag1{ bla \tag2{ bla } bla } bla

$ gawk -f tst.awk file
KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
KUI \tag1{ bla \tag2{ bla } bla } KUI

The above uses GNU awk for word boundaries and IGNORECASE. The need for those can be worked around pretty easily with other awks.

Note that it works even for nested tags (the 2nd input/output line).

Sign up to request clarification or add additional context in comments.

1 Comment

I am not very familiar with awk but this is a good opportunity for trying it. Thank you! @edmorton
2

gawk solution for 1-level enclosing brackets {...}:

awk 'BEGIN{ IGNORECASE=1 }
     {   split($0, a, /\{[^{}]+\}/, seps); 
         for(i=1; i in a; i++) { 
             gsub(/\<bla\>/,"KUI",a[i]); 
             printf "%s%s",a[i],seps[i] 
         } 
         print ""  
     }' file

The output:

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla

Comments

1

First variant - works with nested braces.

awk -F '' '
    function buf_sub() {
        gsub(/\ybla\y/, "KUI", buffer);
        string = string buffer;
        buffer = "";
    }
    BEGIN {
        IGNORECASE = 1;
    }
    {
        string = "";
        buffer = "";
        for(i = 1; i <= NF; i++) {
            if(cnt) 
                string = string $i; 
             else 
                buffer = buffer $i;
            
            if($i == "{") {
                cnt++;
                buf_sub();
            } 
            if($i == "}") 
                cnt--;  
        }
        buf_sub();
        print string;
    }
' input.txt

Input

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
blab bla blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla Bla
bla \tag1{ bla \tag2{ bla } bla } bla 

Output

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
blab KUI blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla KUI
KUI \tag1{ bla \tag2{ bla } bla } KUI

Second variant - doesn't process braces nesting.

sed -r 's/(\\[^}]*})/\n@#\1\n@#/g' input.txt |
sed '/\\/! s/\bbla\b/KUI/gI;' |
sed ':lab; N; $!b lab; s/\n@#//g;'

Input

bla blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } Bla aBla
blab bla blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla Bla

Output

KUI blab blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } KUI aBla
blab KUI blab \cite{bla} \cite[prout]{bla} \footcite[prout][hein]{ bla } aBla KUI

Comments

1

Solution using perl:

perl -lpe 's/(\\footcite([^}]*)|\\cite([^}]*))(*SKIP)(*FAIL)|\bbla\b/KUI/ig' file

The \footcite and \cite commands will be avoid in the replacement process.

1 Comment

This solution is based on the @CasimiretHippolyte proposition.
0

Do:

sed -e 's/\<[bB]la\>/KUI/g' yourFile

where:

\<bla\> 

specifies to search a word that match exactly the string 'bla'. \< is used to specify the beginning of the word. In this case the word must start with b or B. \> specify the ending of the word. In this case it must ends with a. Between 'b' ('B') and 'a', there must be only one 'l'.

Update I noticed that sed do not read well the characters '{' and '}' so it consider a word also {bla} and translates it into {KUI}. A workaround, is the following:

sed -e 's/{/opened/g' yourFile > newFile
sed -e 's/}/closed/g' newFile1 > yourFile

sed -e 's/\<[bB]la\>/KUI/g' yourFile > newFile

sed -e 's/opened/{/g' newFile > yourFile
sed -e 's/closed/}/g' yourFile > newFile

It's not so elegant, but it works.

Hope it helps

2 Comments

it is not working with sed on OSX and with GNU sed. The command replaces all occurences
Obviously, it works if you do not have in your file the strings "opened" and "closed". Anyway, you can choose every string you want to get it works.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.