0

Here is my sample data:

1,32425,New Zealand,number,21004
1,32425,New Zealand,number,20522
1,32434,Australia,number,1542
1,32434,Australia,number,986
1,32434,Fiji,number,1

Here is my expected output:

1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes

Basically I am trying to append the Yes/No based on if field 3 is contained in an external file. Here is what I have currently but as I understand it grep is eating all the stdin in the while loop. So I am only getting No added to the end of each line as the first value is not contained in the external file.

while IFS=, read -r type id country number volume
do
  if grep $country externalfile.csv
  then
    echo "${country}"
    sed 's/$/,Yes/' >> file2.csv
  else
    echo "${country}"
    sed 's/$/,No/' >> file2.csv
  fi
done < file1.csv

I added the echo "${country}" as I was trying to troubleshoot and that's how I discovered it was only parsing the first line.

4
  • 1
    What does the other file look like? How big is it? I'd probably parse it into a lookup table to avoid all those calls to grep. Commented Jun 14, 2021 at 15:33
  • There's about 240 lines, and each line is just one country name. Commented Jun 14, 2021 at 15:41
  • 1
    The -q flag from grep is missing. Commented Jun 14, 2021 at 15:57
  • 1
    Also if there are say 1k lines then grep and sed will also run 1k times... Commented Jun 14, 2021 at 16:02

3 Answers 3

3

Assuming there are no headers -

 awk -F, 'NR==FNR{lookup[$1]=$1; next;}
   { if ( lookup[$3] == $3 ) { print $0 ",Yes" } else { print $0 ",No" } }
         ' externalfile.csv file2.csv

This will parse both files in one pass.

If you just prefer to do it in pure bash,

declare -A lookup
while read c; do lookup["$c"]="$c"; done < externalfile.csv

declare -p lookup # this is just to show you what my example loaded
declare -A lookup='([USA]="USA" [Fiji]="Fiji" )'

while IFS=, read a b c d; do 
  [[ -n "${lookup[$c]}" ]] && echo "$a,$b,$c,$d,Yes" || echo "$a,$b,$c,$d,No"
done < file2.csv
1,32425,New Zealand,number,21004,No
1,32425,New Zealand,number,20522,No
1,32434,Australia,number,1542,No
1,32434,Australia,number,986,No
1,32434,Fiji,number,1,Yes

No grep needed.

Sign up to request clarification or add additional context in comments.

1 Comment

Alternatively, { print $0, ($3 in lookup ? "Yes" : "No") } with -v OFS=","
2
awk -F, -v OFS=, 'NR == FNR { ++a[$1]; next } { $(++NF) = $3 in a ? "Yes" : "No" } 1' externalfile.csv file2.csv

Comments

1

Try this:

while read -r line
do
country=`echo $line | cut -d',' -f3`
if grep "$country" externalfile.csv
then
        echo "$line,Yes" >> file2.csv
else
        echo "$line,No" >> file2.csv
fi
done < test.txt

You need to put $country inside the ", because some country could contains more than 1 word. For example New Zealand. You can also set country variable easier using cut command.

3 Comments

Works great! Just so I understand, this works because we set the variable within the while loop, whereas previously I was reading from the variable set by while which was consuming the rest of the stdin?
This is easiest way to catch interesting data. Try this in the terminal: echo "1,32425,New Zealand,number,20522" | cut -d',' -f3. You can look how it works ;) In the loop you are now reading whole line and the you are cutting specific field.
This will still run grep on every line of file2.csv. That's not very kind to the machine.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.