5

I am trying to nail down the error handling for the requests module in python in order to be notified as and when a URL is unavailable, i.e. HTTPError, ConnectionError, Timeout etc...

The issue that I am having is that I seem to be getting status responses of 200 even on FAKE URLs

I have trawled through S.O. & various other web sources, tried many differing ways of seemingly trying to achieve the same goal but have so far come up empty.

I have boiled the code down to as basic as it gets to simplify things.

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    r = requests.get(url,timeout=1)
    try:
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

I expected the first 3 URLs in the list to classed as 'Website Error:' as they are URLs that I have just made up. The final URL in the list is quite obviously real so should be the only one to be listed as 'Website Good:'

What is happening is the first URL produces a correct response to the code as it gives a response code of 503 but the next two URLs do not produce a status_code at all according to https://httpstatus.io/ but only display ERROR with Cannot find URI. another-fake-website.com another-fake-website.com:80

So I expected all but the last URL in the list to be shown as 'Website Error:'

OUTPUT

when running script in Raspberry Pi

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Error: ', 'http://fake-website.com', <Response [503]>)
('Website Good: ', 'http://another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://yet-another-fake-website.com', <Response [200]>)
('Website Good: ', 'http://google.com', <Response [200]>)
>>>

If I enter all 4 URLs in to https://httpstatus.io/ I get this result: HTTPSTATUS Screen Grab

It shows a 503, a 200 & two URLs that do not have a status code but rather just display Error

UPDATE

so I thought that I would check this in Windows using PowerShell & followed this example: https://stackoverflow.com/a/52762602/5251044

This is the output below

c:\Testing>powershell -executionpolicy bypass -File .\AnyName.ps1
0 - http://fake-website.com
200 - http://another-fake-website.com
200 - http://yet-another-fake-website.com
200 - http://google.com

as you can see, I am no further forward.

UPDATE 2

having had further discussions with Fozoro HERE & trying various options with no fix in sight I thought that I would try this code using urllib2 instead of requests

Here is the changed code

from urllib2 import urlopen
import socket

urls = ['http://another-fake-website.com',
        'http://fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com',
        'dskjhkjdhskjh.com',
        'doioieowwros.com']

for url in urls:

    try:
        r  = urlopen(url, timeout = 5)
        r.getcode()
    except:
        pass
    if r.getcode() != 200:
        print ("Website Error: ", url, r.getcode())
    else:
        print ("Website Good: ", url, r.getcode())

Unfortunately the resulting output is still not correct but does differ slightly from the output of the previous code, see below:

Python 2.7.9 (default, Sep 26 2018, 05:58:52) 
[GCC 4.9.2] on linux2
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>> 
('Website Good: ', 'http://another-fake-website.com', 200)
('Website Good: ', 'http://fake-website.com', 200)
('Website Good: ', 'http://yet-another-fake-website.com', 200)
('Website Good: ', 'http://google.com', 200)
('Website Good: ', 'dskjhkjdhskjh.com', 200)
('Website Good: ', 'doioieowwros.com', 200)
>>> 

This time it is showing all 200 responses, very peculiar.

0

2 Answers 2

2

You should put r = requests.get(url,timeout=1) inside of the try: block. So your code needs to look like this:

import requests

urls = ['http://fake-website.com', 
        'http://another-fake-website.com',
        'http://yet-another-fake-website.com',
        'http://google.com']

for url in urls:
    try:
        r = requests.get(url,timeout=1)
        r.raise_for_status()
    except:
        pass
    if r.status_code != 200:
        print ("Website Error: ", url, r)
    else:
        print ("Website Good: ", url, r)

Output:

Website Error:  http://fake-website.com <Response [503]>
Website Error:  http://another-fake-website.com <Response [503]>
Website Error:  http://yet-another-fake-website.com <Response [503]>
Website Good:  http://google.com <Response [200]>

I hope this helps!

Sign up to request clarification or add additional context in comments.

8 Comments

Thanks for replying but I have tried that & only get the same output as before.
I've just added the output that I'm getting with this code, aren't you getting the same one? if so isn't it what you want? @1cm69
Oddly, I am not getting that output. This is what has had me stumped all day. The output you have posted is exactly what I expect but I get... (see added OUTPUT section in my original post)
the output that you've posted in your question looks the same to mine
No, yours show the first 3 URLs as 503 & last as 200. Mine shows first URL as 503 but all the rest as 200
|
1

For me, the reason turned out to be a website served by my ISP about the URL being invalid - it's that website that returns a 200, not the fake one.

This can be verified by printing the content of the returned site with requests.get('http://fakesite').text

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.