2

I have a list of URLs in file urls.txt.

If possible, I want to get all URLs that give a 404 error when I attempt to fetch them with curl, and copy them to a new file.

For example the URLs in my file urls.txt:

mysite.com/page1
mysite.com/page2
mysite.com/page3
mysite.com/page4
mysite.com/page5
...
mysite.com/page100
...
mysite.com/page1000

so I want to try to fetch each one and if the fetching fails with error 404, I want to store the failing URL into a new file.

0

1 Answer 1

0

This may not be the best, but try this:

Make a file urlcheck.sh, then give permission to execute. or simply enter this command:

touch urlcheck.sh
chmod +x urlcheck.sh

Paste below script to urlcheck.sh

#!/bin/bash
TIMEOUT=3

if [ ! -f output404.txt ]; then
    touch output404.txt
fi

while IFS= read -r line; do
    OUT_URL=$(curl -I $line 2>&1 -m $TIMEOUT| awk '/HTTP\// {print $2}')
    if [ "$OUT_URL" == "404" ]; then
        echo $line >> output404.txt
        echo "$line written to output404.txt"
    else
        echo "$line     $OUT_URL"
    fi
done < "$1"

and save.

To run the script:

./urlcheck.sh urls.txt

then, check output404.txt generated by the script.

Please note that The url in each line must an url readable by curl such as https://unix.stackexchange.com/.

you can change the timeout second in line TIMEOUT=3.

7
  • thank sir, but output404.txt not generate after runing script Commented May 9, 2020 at 20:14
  • then create it first touch output404.txt. I have modified the script. please check. let me know if it works Commented May 9, 2020 at 20:18
  • the fil output 404.txt generate is still empty Commented May 9, 2020 at 20:27
  • sorry, my bad, I forgot to rename 404.txt to output404.txt. fixed now. try it Commented May 9, 2020 at 20:30
  • Glad it's working. You can still improve the performance if you edit the curl command. Commented May 9, 2020 at 21:03

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.