Bash shell: Count occurrences of pattern (in one file) listed in arrays (array elements loaded from different file)

Question

Hi I have loaded patterns of pattern.txt file into array and now I would like to grep count of each array element from second file (named as count.csv)

pattern.txt

abc
def
ghi

count.csv

1234,abc,joseph
5678,ramson,abc
2231,sam,def
1123,abc,richard
2521,ghi,albert
7371,jackson,def

bash shell script is given below:

declare -a myArray

myArray=( $(awk '{print $1}' ./pattern.txt))

for ((i=0; i < ${#myArray[*]}; i++))    
do    
   var1=$(grep -c "${myArray[i]}" count.csv)    
   echo $var1    
done

But, when I run the script, instead of giving below output

3
2
1

It gives output as

0
0
1

i.e. it only gives correct count of last array element.

RomanPerekhrest · Accepted Answer · 2017-05-23 19:02:18Z

3

grep + sort + uniq pipeline solution:

grep -o -w -f pattern.txt count.csv | sort | uniq -c

The output:

  3 abc
  2 def
  1 ghi

grep options:

-f - obtain pattern(s) from file
-o - print only the matched parts of matching lines
-w - select only those lines containing matches that form whole words

The alternative awk approach:

awk 'NR==FNR{p[$0]; next}{ for(i=1;i<=NF;i++){ if($i in p) {p[$i]++; break} }}
     END {for(i in p) print p[i],i}' pattern.txt FS="," count.csv

The output:

2 def
3 abc
1 ghi

p[$0] - accumulating patterns from the 1st input file (pattern.txt)
for(i=1;i<=NF;i++) - iterating though the fields of the line of the 2nd file (count.csv)
if($i in p) {p[$i]++; break} - incrementing counter for each matched pattern

edited May 23, 2017 at 19:02

answered May 23, 2017 at 18:32

RomanPerekhrest

93.1k4 gold badges75 silver badges112 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

laxman sharma Over a year ago

@RomanPerekhrest # grep -o -f pattern.txt count.csv | sort | uniq -c 1 abc 1 def 1 ghi

RomanPerekhrest Over a year ago

@laxmansharma, check again - it works fine for your input. Otherwise, you have posted nonactual input. Besides, it should be executed as one independent line, without any loop

anubhava · Accepted Answer · 2017-05-23 18:07:31Z

0

It is better to use awk for processing text files line by line:

awk -F, 'NR==FNR {wrd[$1]; next} $2 in wrd{wrd[$2]++} $3 in wrd{wrd[$3]++} 
END{for (w in wrd) print w, wrd[w]}' pattern.txt count.csv

def 2
abc 3
ghi 1

Reference: Effective AWK Programming

answered May 23, 2017 at 18:07

anubhava

790k67 gold badges603 silver badges671 bronze badges

3 Comments

laxman sharma Over a year ago

It gives output: ef, ghi and bc

anubhava Over a year ago

You probably have DOS line ending in your files. Run dos2unix on both files before you run this awk command.

anubhava Over a year ago

If dos2unix is unavailable then

awk -v RS='\r\n' -F, 'NR==FNR {wrd[$1]; next} $2 in wrd{wrd[$2]++} $3 in wrd{wrd[$3]++} END{for (w in wrd) print w, wrd[w]}' pattern.txt count.csv

will also work.

Olli K · Accepted Answer · 2017-05-23 18:27:18Z

0

You could also skip the array and just loop over the patterns:

while read -r pattern; do
    [[ -n $pattern ]] && grep -c "$pattern" count.csv
done < pattern.txt

grep -c outputs just the counts of the matches

answered May 23, 2017 at 18:27

Olli K

1,7601 gold badge16 silver badges17 bronze badges

1 Comment

laxman sharma Over a year ago

It is not giving count value of third pattern (ghi)--only count of abc and def i m getting. Further, i m getting strange output. It gives count value of abc as 1 but when i put all the field abc of 3 rows in third column (pattern.txt file), it gives correct output.

Girrafish · Accepted Answer · 2017-05-23 18:56:33Z

0

Try using this command instead:

mapfile -t myArray < pattern.txt
for pattern in ${myArray[*]}; do
  echo $(grep -o $pattern count.csv| wc -l)
done

Output:
3
2
1

mapfile will store every pattern in pattern.txt into myArray
The for loop will iterate through each pattern in myArray and print the number of occurrence of pattern in count.csv

edited May 23, 2017 at 18:56

answered May 23, 2017 at 18:07

Girrafish

2,5021 gold badge23 silver badges35 bronze badges

3 Comments

laxman sharma Over a year ago

It gives output '0'

laxman sharma Over a year ago

i m getting strange output. It gives count value of abc as 1 but when i put all the field abc of 3 rows in third column (pattern.txt file), it gives correct output.

Girrafish Over a year ago

@laxmansharma Not quite sure what you mean, but it works for both cases where your patterns are on one line (abc def ghi) or on separate lines like in your example pattern.txt

Collectives™ on Stack Overflow

Bash shell: Count occurrences of pattern (in one file) listed in arrays (array elements loaded from different file)

4 Answers 4

2 Comments

3 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

3 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related