0

I am trying to write a Bash script that checks and returns IDs of rows in CSV that fail certain criteria. A sample CSV is like below, I am thinking the [ -z {$CATEGORY} ] menthod to identify null value cell in CATEGORY column of the CSV. However, it seem that my if statement is not catching the null value in the CSV, hence need help

ID,DATE,PRODUCT CODE,CATERGORY
1,01/01/2000,10009,1
2,02/01/2000,9999,2
3,25/01/2000,1009,3
4,15/09/2000,2001,5
5,09/25/2000,2003,4
6,09/10/01,2091,P
7,20/02/2002,3098,6
8,01/03/2003,4097,3
9,03/04/2004,5000,2
10,05/02/2013,4000,1
11,10/01/2015,9,

This is my bash script code, the null value is in the row with ID = 11

#!/bin/bash
FILE=${1}
IFS=$'\n'
((c=-1))
for row in $(cat $FILE)
do
        ((c++))
        if ((c==0))
                then
                        continue
        fi
        IFS=','
        read ID DATE PRODUCT CATEGORY <<<${row}

                if [ -z {$CATEGORY} ];
                then
                     echo "$ID" >> file.txt
                fi
done
2
  • 1
    Don't read lines with a for loop -- there are all sorts of weird things that can go wrong. Also, changing IFS (without restricting it to a specific command) can cause other weird problems. Commented Jun 28, 2022 at 8:15
  • 1
    If CATEGORY is empty, {$CATEGORY} expands to {} which is not a null-string. Therefore your query will always be true, unless when CATEGORY contains a space, in which case you will get a syntax error. Commented Jun 28, 2022 at 8:19

1 Answer 1

3

-z {$CATEGORY} should be -z ${CATEGORY}, but read ID ... <<< ${row} will assign only ID... Try:

#!/bin/bash

while IFS=, read -r ID DATE PRODUCT CATEGORY; do
  if [[ "$CATEGORY" =~ ^[[:space:]]*$ ]]; then
    echo "$ID"
  fi
done < <( tail -n+2 "$1" ) > file.txt

Note that awk or sed would be much faster and simpler for this (see, for instance, https://mywiki.wooledge.org/DontReadLinesWithFor). Example with awk (tested with recent BSD and GNU awk):

awk -F, 'NR>1 && $NF ~ /^[[:space:]]*$/ {print $1}' "$FILE" > file.txt

Example with sed (tested with recent BSD and GNU sed):

sed -En 's/^([^,]*).*,[[:space:]]*$/\1/p' "$FILE" > file.txt
Sign up to request clarification or add additional context in comments.

4 Comments

Renaud thanks for your answer but your while loop solution isn't working. But will look at the awk or sed methods you suggest.
@ChanWeeHow What is not working? I just tested with your own example and it behaves as you want.
@RenaudPacalet, I copied your while loop code and ran it but my file.txt is still empty.
Then you probably have spaces after the comma in your input file. Do you confirm? I updated my answer to also cover this case.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.