2

I have several .csv files and each csv file has lines which look like this.

AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA

I am reading through each line of each csv file and then trying to replace the 4th position of each line beginning with AA with "ZZ"

Expected output

AA,1,CC,ZZ,EE
EE,FF,6,ZZ,8,9
BB,6,7,8,99,AA

However the variable "y" does contain the 4th variable "1" and "7" respectively, but when I use sed command it replaces the first occurrence of "1" with "ZZ".

How do I modify my code to replace only the 4th position of each line irrespective of what value it holds?

My code looks like this

$file = "name of file which contains list of all csv files"

for i in `cat file`
while IFS = read -r line;
do
if [[ $line == AA* ]] ; then
        y=$(echo "$line" | cut -d',' -f 4)
        sed -i "s/${y}/ZZ/" $i
fi
done < $i
2
  • Why did AA at the start of line 2 become EE? Commented Nov 3, 2018 at 12:41
  • 1
    Running sed on a line at a time is definitely a horrible antipattern. You want to process the entire file with a single process, and especialny, avoid rewriting the file multiple times. Commented Nov 3, 2018 at 13:08

4 Answers 4

2

Using sed, you can also direct that only the 4th field of a comma separated values file be changed to "ZZ" for lines beginning "AA" with:

sed -i '/^AA/s/[^,][^,]*/ZZ/4' file

Explanation

  • sed -i call sed to edit file in place;
  • general form /find/s/match/replace/occurrence; where
    • find is /^AA/ line beginning with "AA";
    • match [^,][^,]* a character not a comma followed by any number of non-commas;
    • replace /ZZ/4 the 4th occurrence of match with "ZZ".

Note, both awk and sed provide good solutions in this case so see the answers by @perreal and @RavinderSingh13

Example Input File

$ cat file
AA,1,CC,1,EE
AA,FF,6,7,8,9
BB,6,7,8,99,AA

Example Use/Output

(note: -i not used below so the changes are simply output to stdout)

$ sed '/^AA/s/[^,][^,]*/ZZ/4' file
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA
Sign up to request clarification or add additional context in comments.

5 Comments

It's a rather rare bird. I literally stumbled across it a couple of years back and just added it to my toolbox. It's application is tailored for .csv files, so it doesn't get a great deal of press. (awk taking the lions share there)
Hi. Instead of replacing content at the 4th location in a csv, I want to add it at the 4th position and move the older 4th position by one position to the right. Can you help me with this?
Yes, you can do it with a simple backreference, e.g. . sed '/^AA/s/\([^,][^,]*\)/ZZ,\1/4' file results in the first line of AA,1,CC,ZZ,1,EE, etc. Note: you are capturing the text between \(....\) and the reinserting that text with \1 (first backrefernce) after you insert ZZ,.
great.this works. but do you have an elaborate explanation or a link for the same? I couldnt find much online.
It is the same explanation as in the answer, plus \(...\) which captures the text inside creating a back-reference to that text that lets you re-insert that text in the "replace" part of the substitution using \1 (if you had a second \(...\) then you would use \2 and so on...) See 5.7 Back-references and Subexpressions So you are just saving the 4th field in \([^,][^,]*\) and then replacing it with ZZ,\1 -- which inserts ZZ, as the 4th filed and moves the original to the right.
2

To robustly do this is just:

$ awk 'BEGIN{FS=OFS=","} $1=="AA"{$4="ZZ"} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,8,99,AA

Note that the above is doing a literal string comparison and a literal string replacement so unlike the other solutions posted so far it won't fail if the target string (AA in this example) contains regexp metachars like . or *, nor if it can be part of another string like AAX, nor if the replacement string (ZZ in this example) contains backreferences like & or \1.

If you want to map multiple strings in one pass:

$ awk 'BEGIN{FS=OFS=","; m["AA"]="ZZ"; m["BB"]="FOO"} $1 in m{$4=m[$1]} 1' csv
AA,1,CC,ZZ,EE
AA,FF,6,ZZ,8,9
BB,6,7,FOO,99,AA

and just like GNU sed has -i for "inplace" editing, GNU awk has -i inplace, so you can discard the shell loop and just do:

awk -i inplace '
BEGIN { FS=OFS="," }
(NR==FNR) { ARGV[ARGC++]=$0 }
(NR!=FNR) && ($1=="AA") { $4="ZZ" }
{ print }
' file

and it'll operate on all of the files named in file in one call to awk. "file" in that last case is your file containing a list of other CSV file names.

Comments

1

EDIT1: Since OP has changed requirement a bit do adding following now.

awk 'BEGIN{FS=OFS=","} /^AA/||/^BB/{$4="ZZ"} /^CC/||/^DD/{$5="NEW_VALUE"} 1'  Input_file > temp_file && mv temp_file Input_file

Could you please try following.

awk -F, '/^AA/{$4="ZZ"} 1' OFS=,  Input_file > temp_file && mv temp_file Input_file

OR

awk 'BEGIN{FS=OFS=","} /^AA/{$4="ZZ"} 1'  Input_file > temp_file && mv temp_file Input_file

Explanation: Adding explanation to above code too now.

awk '
BEGIN{              ##Starting BEGIN section of awk which will be executed before reading Input_file.
  FS=OFS=","        ##Setting field separator and output field separator as comma here for all lines of Input_file.
}                   ##Closing block for BEGIN section of this program.
/^AA/{              ##Checking condition if a line starts from string AA then do following.
  $4="ZZ"           ##Setting 4th field as ZZ string as per OP.
}                   ##Closing this condition block here.
1                   ##By mentioning 1 we are asking awk to print edited or non-edited line of Input_file.
'  Input_file       ##Mentioning Input_file name here.

8 Comments

this should replace both the lines after do in the above code?
I just updated my question :) If the line begins with AA only i want to replace the 4th field
I will be reading through each line in the file in order to check whether it begins with AA or not. Could you please help me with the code? Thanks
Sure, once you add the explanation I will try it and definitely give an upvote.:)
@ShivaniSarin, on SO there is a way to tell thanks :) And let us delete all comments under this answer too to keep answer clean.
|
1

Using sed:

sed -i 's/\(^AA,[^,]*,[^,]*,\)[^,]*/\1ZZ/' input_file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.