0

Hi I have loaded patterns of pattern.txt file into array and now I would like to grep count of each array element from second file (named as count.csv)

pattern.txt

abc
def
ghi

count.csv

1234,abc,joseph
5678,ramson,abc
2231,sam,def
1123,abc,richard
2521,ghi,albert
7371,jackson,def                   

bash shell script is given below:

declare -a myArray

myArray=( $(awk '{print $1}' ./pattern.txt))

for ((i=0; i < ${#myArray[*]}; i++))    
do    
   var1=$(grep -c "${myArray[i]}" count.csv)    
   echo $var1    
done

But, when I run the script, instead of giving below output

3
2
1

It gives output as

0
0
1

i.e. it only gives correct count of last array element.

0

4 Answers 4

3

grep + sort + uniq pipeline solution:

grep -o -w -f pattern.txt count.csv | sort | uniq -c

The output:

  3 abc
  2 def
  1 ghi

grep options:

  • -f - obtain pattern(s) from file

  • -o - print only the matched parts of matching lines

  • -w - select only those lines containing matches that form whole words


The alternative awk approach:

awk 'NR==FNR{p[$0]; next}{ for(i=1;i<=NF;i++){ if($i in p) {p[$i]++; break} }}
     END {for(i in p) print p[i],i}' pattern.txt FS="," count.csv

The output:

2 def
3 abc
1 ghi

  • p[$0] - accumulating patterns from the 1st input file (pattern.txt)

  • for(i=1;i<=NF;i++) - iterating though the fields of the line of the 2nd file (count.csv)

  • if($i in p) {p[$i]++; break} - incrementing counter for each matched pattern

Sign up to request clarification or add additional context in comments.

2 Comments

@RomanPerekhrest # grep -o -f pattern.txt count.csv | sort | uniq -c 1 abc 1 def 1 ghi
@laxmansharma, check again - it works fine for your input. Otherwise, you have posted nonactual input. Besides, it should be executed as one independent line, without any loop
0

It is better to use awk for processing text files line by line:

awk -F, 'NR==FNR {wrd[$1]; next} $2 in wrd{wrd[$2]++} $3 in wrd{wrd[$3]++} 
END{for (w in wrd) print w, wrd[w]}' pattern.txt count.csv

def 2
abc 3
ghi 1

Reference: Effective AWK Programming

3 Comments

It gives output: ef, ghi and bc
You probably have DOS line ending in your files. Run dos2unix on both files before you run this awk command.
If dos2unix is unavailable then awk -v RS='\r\n' -F, 'NR==FNR {wrd[$1]; next} $2 in wrd{wrd[$2]++} $3 in wrd{wrd[$3]++} END{for (w in wrd) print w, wrd[w]}' pattern.txt count.csv will also work.
0

You could also skip the array and just loop over the patterns:

while read -r pattern; do
    [[ -n $pattern ]] && grep -c "$pattern" count.csv
done < pattern.txt

grep -c outputs just the counts of the matches

1 Comment

It is not giving count value of third pattern (ghi)--only count of abc and def i m getting. Further, i m getting strange output. It gives count value of abc as 1 but when i put all the field abc of 3 rows in third column (pattern.txt file), it gives correct output.
0

Try using this command instead:

mapfile -t myArray < pattern.txt
for pattern in ${myArray[*]}; do
  echo $(grep -o $pattern count.csv| wc -l)
done

Output:
3
2
1

mapfile will store every pattern in pattern.txt into myArray
The for loop will iterate through each pattern in myArray and print the number of occurrence of pattern in count.csv

3 Comments

It gives output '0'
i m getting strange output. It gives count value of abc as 1 but when i put all the field abc of 3 rows in third column (pattern.txt file), it gives correct output.
@laxmansharma Not quite sure what you mean, but it works for both cases where your patterns are on one line (abc def ghi) or on separate lines like in your example pattern.txt

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.