0

I have about 3000 files in a folder. My files have data as given below:

VISITERM_0 VISITERM_20 VISITERM_35 ..... and so on

Each files do not have the same values like as above. They vary from 0 till 99.

I want to find out how many files in the folder have each of the VISITERMS. For example, if VISITERM_0 is present in 300 files in the folder, then I need it to print

VISITERM_0  300

Similary if there are 1000 files that contain VISITERM_1, I need it to print VISITERM_1 1000

So, I want to print the VISITERMs and the number of files that have them starting from VISITERM_0 till VISITERM_99.

I made use of grep command which is

 grep VISITERM_0 * -l | wc -l

However, this is for a single term and I want to loop this from VISITERM_0 till VISITERM_99. Please helP!

1
  • It's unclear what you ask. Please reformulate your question... Commented Mar 2, 2015 at 23:33

2 Answers 2

1
#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax

# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
  # Calculate number of matches...
  num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
  # ...and print the result.
  printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done
Sign up to request clarification or add additional context in comments.

2 Comments

...then change your loop to start at 100 and proceed to 0, in the exact same way you would do so in C.
One more doubt. If I want to reorder the entire output in the descending order depending upon the number of file numbers, for instance: VISITERM_0 300, VISITERM_1 150, VISITERM_2 400 are the results and I need them to be organized as VISITERM_2 400, VISITERM_0 300, VISITERM_1 150, How can it be done?
1

Here is an gnu awk (gnu due to multiple characters in RS) that should do:

awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *

Example:

cat file1
VISITERM_0 VISITERM_320 VISITERM_35

cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11

Gives:

awk -v RS=" |\n" '{n=split($1,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1

How it works:

awk -v RS=" |\n" '              # Set record selector to space or new line
    {n=split($1,a,"VISITERM_")  # Split record using "VISITERM_" as separator and store hits of split in "n"
    if (n==2 && a[2]<100)       # If "n" is "2" (does contain "ISITERM_") and has number less "100"
        b[a[2]]++}              # Count the hit of each number and stor it in array "b"
END {for (i in b)               # Walk trough array "b"
    print "VISITERM_"i,b[i]}    # Print the hits
' file*                         # Read the files

PS
If everything is only on one line, change to RS=" ". Then it should work on most awk

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.