0

I'm trying to find the number of lines in a file that match a certain pattern and find the number of lines that matched. For example, if my file were

test1 type1 random1

test2 type2 bird

dog cat random

I want to find the lines that have "random" and the number of lines. Ideally, the output would be something like

test1 type1 random1

dog cat random

2

I know how to use grep to do either of these tasks individually, but if I'm working with a large file, I'd prefer to not read the file twice. I'd also like to stay away from making an additional temp file to store the results of grep.

Is there a command and/or a simple function I can write to achieve these results?

5 Answers 5

2
awk 'BEGIN{total=0} {if(/random/) {total+=1; print $0;}}END{print total}' input_file
Sign up to request clarification or add additional context in comments.

2 Comments

Or with perl: perl -nle 'if (/random/) { $count++; print } END { print $count || 0 }'
You could omit the BEGIN block; the variable total will be initialized to zero when it is first referenced. You could also put the match /random/ outside the {…action…}, and use a ++ increment.
2

Nope.

$ cat t.txt
foo: bar
foo: quux
bar: baz
$ awk -v regex='bar' '$0 ~ regex { count++; print } END {print count}' t.txt
foo: bar
bar: baz
2

Comments

2
awk '/random/{count++;print}END{print count}' file

If match found, increment the counter and print. Print the count at the end.

1 Comment

This is nice and compact.
1

I like the awk solutions here, but as always, there's more than one way to skin a cat. If you number the output lines with nl it's easy to see how many matches you got.

grep stuff from files | nl

Getting exactly the output you specified in the question is a simple matter of postprocessing (though I would not bother). Pipe to a simple sed script to remove the line number, then print the latest removed number at the end.

grep stuff from files |
nl |
sed -n 'h                    # Keep a copy in hold space
     s/^ *[1-9][0-9]*\t//p   # Print without number
     $!b                     # Unless at last line, we're done
     x                       # Retrieve from hold space
     s/\t.*//p'              # Print only line number

(If your sed dialect does not recognize \t as a literal tab, or cannot cope with comments on the same line, you'll need to adapt this. In most shells, you can type a literal tab with ctrl-V tab.)

2 Comments

Um, because you want to see the actual matches, and don't want to rerun a potentially expensive search?
@RolfofSaxony Then I suggest you remove your comments after the first one. When you read this I will already have removed my comments from the thread.
1

awk variant for this problem statement is more optimized. But if you don't want to use awk, here is grep+wc variant:

In case, you want to use grep, instead of awk.

$ grep -F random random.log | tee /dev/tty | wc -l
test1 type1 random1
dog cat random
2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.