how to replace a string at a specific position in a csv file using bash

Question

I have several .csv files and each csv file has lines which look like this.

AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA

I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"

Expected output

AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA

However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".

How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?

My code looks like this

$file = "name of file which contains list of all csv files"

for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
        y=$(echo "$line" | cut -d',' -f 4)
        sed -i "s/${y}/ZZ/" $i
fi
done < $i

Running sed on a line at a time is definitely a horrible antipattern. You want to process the entire file with a single process, and especialny, avoid rewriting the file multiple times. — tripleee
– tripleee, Commented Nov 3, 2018 at 13:08

David C. Rankin · Accepted Answer · 2018-11-03 05:22:28Z

2

Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:

sed -i '/^AA/s/[^,][^,]*/ZZ/4' file

Explanation

sed -i call sed to edit file in place;
general form /find/s/match/replace/occurrence; where
- find is /^AA/ line beginning with "AA";
- match [^,][^,]* a character not a comma followed by any number of non-commas;
- replace /ZZ/4 the 4th occurrence of match with "ZZ".

Note, both awk and sed provide good solutions in this case so see the answers by @perreal and @RavinderSingh13

Example Input File

$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA

Example Use/Output

(note: -i not used below so the changes are simply output to stdout)

$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA

edited Nov 3, 2018 at 5:22

answered Nov 3, 2018 at 5:10

David C. Rankin

85.1k6 gold badges67 silver badges95 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

David C. Rankin Over a year ago

It's a rather rare bird. I literally stumbled across it a couple of years back and just added it to my toolbox. It's application is tailored for .csv files, so it doesn't get a great deal of press. (awk taking the lions share there)

ShivaniSarin Over a year ago

Hi. Instead of replacing content at the 4th location in a csv, I want to add it at the 4th position and move the older 4th position by one position to the right. Can you help me with this?

David C. Rankin Over a year ago

Yes, you can do it with a simple backreference, e.g. . sed '/^AA/s/$[^,][^,]*$/ZZ,\1/4' file results in the first line of AA,1,CC,ZZ,1,EE, etc. Note: you are capturing the text between $....$ and the reinserting that text with \1 (first backrefernce) after you insert ZZ,.

ShivaniSarin Over a year ago

great.this works. but do you have an elaborate explanation or a link for the same? I couldnt find much online.

David C. Rankin Over a year ago

It is the same explanation as in the answer, plus $...$ which captures the text inside creating a back-reference to that text that lets you re-insert that text in the "replace" part of the substitution using \1 (if you had a second $...$ then you would use \2 and so on...) See 5.7 Back-references and Subexpressions So you are just saving the 4th field in $[^,][^,]*$ and then replacing it with ZZ,\1 -- which inserts ZZ, as the 4th filed and moves the original to the right.

Ed Morton · Accepted Answer · 2018-11-03 13:18:25Z

To robustly do this is just:

$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA

Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.

If you want to map multiple strings in one pass:

$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA

and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:

awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file

and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.

RavinderSingh13 · Accepted Answer · 2018-11-03 04:20:47Z

1

EDIT1: Since OP has changed requirement a bit do adding following now.

awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1'  Input_file > temp_file && mv temp_file Input_file

Could you please try following.

awk -F, '/^AA/{$4="ZZ"} 1' OFS=,  Input_file > temp_file && mv temp_file Input_file

OR

awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1'  Input_file > temp_file && mv temp_file Input_file

Explanation: Adding explanation to above code too now.

awk '
BEGIN{              ##Starting BEGIN section of awk which will be executed before reading Input_file.
  FS=OFS=","        ##Setting field separator and output field separator as comma here for all lines of Input_file.
}                   ##Closing block for BEGIN section of this program.
/^AA/{              ##Checking condition if a line starts from string AA then do following.
  $4="ZZ"           ##Setting 4th field as ZZ string as per OP.
}                   ##Closing this condition block here.
1                   ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
'  Input_file       ##Mentioning Input_file name here.

edited Nov 3, 2018 at 4:20

answered Nov 3, 2018 at 3:26

RavinderSingh13

135k14 gold badges61 silver badges100 bronze badges

8 Comments

ShivaniSarin Over a year ago

this should replace both the lines after do in the above code?

ShivaniSarin Over a year ago

I just updated my question :) If the line begins with AA only i want to replace the 4th field

ShivaniSarin Over a year ago

I will be reading through each line in the file in order to check whether it begins with AA or not. Could you please help me with the code? Thanks

ShivaniSarin Over a year ago

Sure, once you add the explanation I will try it and definitely give an upvote.:)

RavinderSingh13 Over a year ago

@ShivaniSarin, on SO there is a way to tell thanks :) And let us delete all comments under this answer too to keep answer clean.

|

MBT · Accepted Answer · 2018-11-03 15:14:33Z

1

Using sed:

sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

edited Nov 3, 2018 at 15:14

MBT

24.6k23 gold badges96 silver badges113 bronze badges

answered Nov 3, 2018 at 4:18

perreal

98.7k23 gold badges159 silver badges187 bronze badges

Collectives™ on Stack Overflow

how to replace a string at a specific position in a csv file using bash

4 Answers 4

5 Comments

Comments

8 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

8 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related