1

I want to write a script in bash that prints the least repeating line of standard input

I wrote this code:

#!/bin/bash
var=1000
while read line
do
    tmp=$(grep -c $line)
    if [ $tmp -lt $var ]
    then
        var=$tmp
        out=$line
    fi
done
var="$var $out"
echo $var

but e.g. when using a test like this

id1
id2
id3
id1
square
id1
id2
id3
id1
circle
id2
id2

the program only enters the loop once thus it gives a bad output

3 id1

when the correct one should be

1 square

This line

tmp=$(grep -c $line)

seems to be breaking the loop but I can't find out why. Is there any way to bypass using grep in my code or any other way to fix my script?

7
  • Why is circle your expected output? It is neither the last repeating nor the last unique line in your example. Commented Apr 30, 2016 at 10:26
  • It should be the least repeating, not the last repeating ;) Still, your answer below helped me a lot ;) Commented Apr 30, 2016 at 10:55
  • So do you mean the first unique line, then? You have multiple unique lines; theyare all the least repeating. Commented Apr 30, 2016 at 10:58
  • No, i guess my English skills didn't let me make this clear enough, if there is a unique line in the stdin it should also print it, let's say we have a one line containing word: square , two lines containing word: circle and three lines containing word: triangle. It should print "square" because it only appears once in the file (appears the least amount of times) Commented Apr 30, 2016 at 11:11
  • That much is clear, but if there is three of each, do you only want the first one? Commented Apr 30, 2016 at 11:13

2 Answers 2

2

The problem in your code is that this grep

    tmp=$(grep -c $line)

will read from stdin and thus consume all the lines on the very first round the while loop is executed. I.e. first you will read the first line into $line. Then you will grep for this string in the rest of the stdin.

You could fix your code by using a temporary file, e.g.:

#!/bin/bash
tmpfile=$(mktemp)
cat > "$tmpfile"
min=0
while IFS= read -r line; do
    count=$(grep -c "$line" $tmpfile)
    if (( min == 0 || (count < min) )); then
        min=$count
        out="$min $line"
    fi
done < <(sort -u "$tmpfile")
rm "$tmpfile"
echo "$out"

But this is of course quite horrible solution as it uses temporary file and opens the input file many times. Better would be to use something like:

#!/bin/bash
sort | uniq -c | sort -n | head -1
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for your answer :)
0

The grep command reads the remainder of standard input. You will need to copy the input to a temp file if you want to both grep it and do something else with it.

A much simpler solution to your problem is

uniq -d | tail -n 1

More generally, running grep on each line in a loop over a file is at antipattern which often suggests moving to Awk or sed instead, if you can't find a simple pipeline with standard tools to accomplish your goal.

1 Comment

Thanks, you helped me a lot!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.