So I'm writing a script that will take the output of a grep as an array and then iterate a filter over it to output to a file. I'm testing it on my own site and the wget works as expected and generates a list of URLS in the spider.queue. The grep command will also work in terms of filtering by a keyword, but when I add it into a while loop and use an if statement to check if it already exists I'll get the error;
./spider.sh: 19: ./spider.sh: Syntax error: "(" unexpected (expecting "done")
Which would lead me to believe it's a syntax problem with one of the loops.
#!/bin/sh
# Usage - ./spider.sh searchterm www.website.com
## Parameters
search=$1
URL=$2
## Spider WGET
wget -r -e robots=off --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" http://$URL 2>&1 | grep '^--' 2>&1 | awk '{ print $3 }' | grep -v '\.\(css\|js\|png\|gif\|jpg\|JPG\)$' >> spider.queue
## Keyword filter with grep
while true
do
PROFILES=($(grep -l -r "$search" $URL))
for x in ${PROFILES[*]}
do
if grep -q $x crawler.queue; then
echo "Already Exists"
else
$x >> crawler.queue
fi
done
done
echo "$x" >> crawler.queue?wget ... "http://$URL"andgrep -q "$x" crawler.queuewhile truewill run for ever. Perhaps there was something wrong with the string that you replaced withtrue.