0

I'm having some rather unusual problems using grep in a bash script. Below is an example of the bash script code that I'm using that exhibits the behaviour:

UNIQ_SCAN_INIT_POINT=1
cat "$FILE_BASENAME_LIST" | uniq -d >> $UNIQ_LIST
sed '/^$/d' $UNIQ_LIST >> $UNIQ_LIST_FINAL
UNIQ_LINE_COUNT=`wc -l $UNIQ_LIST_FINAL | cut -d \  -f 1`
while [ -n "`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`" ]; do
    CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`
    CURRENT_DUPECHK_FILE=$FILE_DUPEMATCH-$CURRENT_LINE 
    grep $CURRENT_LINE $FILE_LOCTN_LIST >> $CURRENT_DUPECHK_FILE
    MATCH=`grep -c $CURRENT_LINE $FILE_BASENAME_LIST`
    CMD_ECHO="$CURRENT_LINE matched $MATCH times," cmd_line_echo
    echo "$CURRENT_DUPECHK_FILE" >> $FILE_DUPEMATCH_FILELIST
    let UNIQ_SCAN_INIT_POINT=UNIQ_SCAN_INIT_POINT+1
done

On numerous occasions, when grepping for the current line in the file location list, it has put no output to the current dupechk file even though there have definitely been matches to the current line in the file location list (I ran the command in terminal with no issues).

I've rummaged around the internet to see if anyone else has had similar behaviour, and thus far all I have found is that it is something to do with buffered and unbuffered outputs from other commands operating before the grep command in the Bash script....

However no one seems to have found a solution, so basically I'm asking you guys if you have ever come across this, and any idea/tips/solutions to this problem...

Regards

Paul

6
  • What are you precisely trying to do? We might be able to suggest a better way to go about it. Commented Feb 24, 2010 at 16:15
  • I have a very long bash script that is to perform the operation of locating duplicate files in a given directory. This part of the script is finding the duplicates and printing those to a file. It is worth noting im doing this for a piece of university coursework and the awk command isnt allowed. Commented Feb 24, 2010 at 16:19
  • awk command isn't allowed but sed/grep is? lol Commented Feb 24, 2010 at 16:38
  • You've used cat three times and all three are unnecessary. Commented Feb 24, 2010 at 16:39
  • realised that last nite, i put it down to less than full experience using linux commands! Commented Feb 25, 2010 at 9:49

4 Answers 4

1

The `problem' is the standard I/O library. When it is writing to a terminal it is unbuffered, but if it is writing to a pipe then it sets up buffering.

try changing

CURRENT_LINE=`cat $UNIQ_LIST_FINAL | sed "$UNIQ_SCAN_INIT_POINT"'q;d'`

to

CURRENT LINE=`sed "$UNIQ_SCAN_INIT_POINT"'q;d' $UNIQ_LIST_FINAL`
Sign up to request clarification or add additional context in comments.

1 Comment

Lifesaver. I understand now that you explained it, but would never have even thought of that otherwise, thanks!
0

Are there any directories with spaces in their names in $FILE_LOCTN_LIST? Because if they are, those spaces will need escaped somehow. Some combination of find and xargs can usually deal with that for you, especially xargs -0

2 Comments

I am currently using this command to compile the $FILE_LOCTN_LIST: echo $SCAN_DIRNAME | xargs -I {/} find {/} -type f > $FILE_LOCTN_LIST I think xargs -I performs similarly to xargs -0 ?
Ok, so it isn't going to be escaping if it's the name of a single file.
0

A small bash script using md5sum and sort that detects duplicate files in the current directory:

CURRENT="" md5sum * | 
  sort | 
  while read md5sum filename; 
  do 
    [[ $CURRENT == $md5sum ]] && echo $filename is duplicate; 
    CURRENT=$md5sum; 
  done

Comments

0

you tagged linux, some i assume you have tools like GNU find,md5sum,uniq, sort etc. here's a simple example to find duplicate files

$ echo "hello world">file
$ md5sum file
6f5902ac237024bdd0c176cb93063dc4  file
$ cp file file1
$ md5sum file1
6f5902ac237024bdd0c176cb93063dc4  file1
$ echo "blah" > file2
$ md5sum file2
0d599f0ec05c3bda8c3b8a68c32a1b47  file2
$ find . -type f -exec md5sum "{}" \; |sort -n | uniq -w32 -D
6f5902ac237024bdd0c176cb93063dc4  ./file
6f5902ac237024bdd0c176cb93063dc4  ./file1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.