What’s the best way to get an HTTP response code from a URL?

Question

I’m looking for a quick way to get an HTTP response code from a URL (i.e. 200, 404, etc). I’m not sure which library to use.

4wk_ · Accepted Answer · 2024-03-11 09:46:42Z

132

Update using the wonderful requests library. Note we are using the HEAD request, which should happen more quickly then a full GET or POST request.

import requests
try:
    r = requests.head("https://stackoverflow.com")
    print(r.status_code)
    # prints the int of the status code*
except requests.ConnectionError:
    print("failed to connect")

*Find more at https://developer.mozilla.org/en-US/docs/Web/HTTP/Status

edited Mar 11, 2024 at 9:46

4wk_

2,7935 gold badges37 silver badges52 bronze badges

answered Nov 30, 2012 at 8:40

Gourneau

12.9k8 gold badges44 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

WKPlus Over a year ago

requests is much better than urllib2, for such a link: dianping.com/promo/208721#mod=4, urllib2 give me a 404 and requests give a 200 just as what I get from a browser.

tmthyjames Over a year ago

httpstatusrappers.com...awesome!! My code is on that Lil Jon status, son!

Awn Over a year ago

This is the best solution. Much better than any of the others.

Dennis Golomazov Over a year ago

@WKPlus for the record, now requests gives 403 for your link, although it's still working in browser.

seaders Over a year ago

@Gourneau Ha! That wasn't what I intended with my comment, I think it was perfectly fine, and in this context, people should try understand why it "Just works" in the browser, but returns a 403 in code, when in actuality, the same thing's happening both places.

|

Evan Fosmark · Accepted Answer · 2009-09-29 23:36:16Z

66

Here's a solution that uses httplib instead.

import httplib

def get_status_code(host, path="/"):
    """ This function retreives the status code of a website by requesting
        HEAD data from the host. This means that it only requests the headers.
        If the host cannot be reached or something else goes wrong, it returns
        None instead.
    """
    try:
        conn = httplib.HTTPConnection(host)
        conn.request("HEAD", path)
        return conn.getresponse().status
    except StandardError:
        return None


print get_status_code("stackoverflow.com") # prints 200
print get_status_code("stackoverflow.com", "/nonexistant") # prints 404

edited Sep 29, 2009 at 23:36

answered Jul 16, 2009 at 23:30

Evan Fosmark

102k36 gold badges109 silver badges118 bronze badges

8 Comments

Ben Blank Over a year ago

+1 for HEAD request — no need to retrieve the entire entity for a status check.

Ben Blank Over a year ago

Although you really should restrict that except block to at least StandardError so that you don't incorrectly catch things like KeyboardInterrupt.

Blaise Over a year ago

I was wondering if HEAD requests are reliable. Because websites might not have (properly) implemented the HEAD method, which could result in status codes like 404, 501 or 500. Or am I being paranoid?

Randall Hunt Over a year ago

How would one make this follow 301s ?

Nick Over a year ago

@Blaise If a website doesn't allow HEAD requests then performing a HEAD request should result in a 405 error. For an example of this, try running curl -I http://www.amazon.com/.

|

RichieHindle · Accepted Answer · 2009-07-16 23:44:33Z

26

You should use urllib2, like this:

import urllib2
for url in ["http://entrian.com/", "http://entrian.com/does-not-exist/"]:
    try:
        connection = urllib2.urlopen(url)
        print connection.getcode()
        connection.close()
    except urllib2.HTTPError, e:
        print e.getcode()

# Prints:
# 200 [from the try block]
# 404 [from the except block]

edited Jul 16, 2009 at 23:44

answered Jul 16, 2009 at 22:31

RichieHindle

283k49 gold badges367 silver badges408 bronze badges

2 Comments

sorin Over a year ago

This is not a valid solution because urllib2 will follow redirects, so you will not get any 3xx responses.

RichieHindle Over a year ago

@sorin: That depends - you might well want to follow redirects. Perhaps you want to ask the question "If I were to visit this URL with a browser, would it show content or give an error?" In that case, if I changed http://entrian.com/ to http://entrian.com/blog in my example, the resulting 200 would be correct even though it involved a redirect to http://entrian.com/blog/ (note the trailing slash).

nickanor · Accepted Answer · 2012-10-31 22:42:38Z

9

In future, for those that use python3 and later, here's another code to find response code.

import urllib.request

def getResponseCode(url):
    conn = urllib.request.urlopen(url)
    return conn.getcode()

edited Oct 31, 2012 at 22:42

answered Oct 12, 2012 at 20:30

nickanor

6692 gold badges13 silver badges19 bronze badges

1 Comment

Niklas R Over a year ago

This will raise a HTTPError for status codes like 404, 500, etc.

Martijn Pieters · Accepted Answer · 2012-11-15 07:10:43Z

3

The urllib2.HTTPError exception does not contain a getcode() method. Use the code attribute instead.

edited Nov 15, 2012 at 7:10

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

answered Sep 29, 2009 at 8:19

user73285

1 Comment

RichieHindle Over a year ago

It does for me, using Python 2.6.

Benjamin Loison · Accepted Answer · 2024-04-26 18:47:55Z

2

Addressing @Niklas R's comment to @nickanor's answer:

from urllib.error import HTTPError
import urllib.request

def getResponseCode(url):
    try:
        conn = urllib.request.urlopen(url)
        return conn.getcode()
    except HTTPError as e:
        return e.code

edited Apr 26, 2024 at 18:47

Benjamin Loison

5,7514 gold badges20 silver badges37 bronze badges

answered Aug 12, 2019 at 16:49

E L

3384 silver badges8 bronze badges

Comments

Mahrez BenHamad · Accepted Answer · 2021-12-16 09:11:07Z

It depends on multiple factories, but try to test these methods:

import requests

def url_code_status(url):
    try:
        response = requests.head(url, allow_redirects=False)
        return response.status_code
    except Exception as e:
        print(f'[ERROR]: {e}')

or:

import http.client as httplib
import urllib.parse

def url_code_status(url):
    try:
        protocol, host, path, query, fragment = urllib.parse.urlsplit(url)
        if protocol == "http":
            conntype = httplib.HTTPConnection
        elif protocol == "https":
            conntype = httplib.HTTPSConnection
        else:
            raise ValueError("unsupported protocol: " + protocol)
        conn = conntype(host)
        conn.request("HEAD", path)
        resp = conn.getresponse()
        conn.close()
        return resp.status
    except Exception as e:
        print(f'[ERROR]: {e}')

Benchmark results for 100 URLs:

First method: 20.90 seconds
Second method: 23.15 seconds

Sam Gleske · Accepted Answer · 2014-03-02 04:14:55Z

Here's an httplib solution that behaves like urllib2. You can just give it a URL and it just works. No need to mess about splitting up your URLs into hostname and path. This function already does that.

import httplib
import socket
def get_link_status(url):
  """
    Gets the HTTP status of the url or returns an error associated with it.  Always returns a string.
  """
  https=False
  url=re.sub(r'(.*)#.*$',r'\1',url)
  url=url.split('/',3)
  if len(url) > 3:
    path='/'+url[3]
  else:
    path='/'
  if url[0] == 'http:':
    port=80
  elif url[0] == 'https:':
    port=443
    https=True
  if ':' in url[2]:
    host=url[2].split(':')[0]
    port=url[2].split(':')[1]
  else:
    host=url[2]
  try:
    headers={'User-Agent':'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0',
             'Host':host
             }
    if https:
      conn=httplib.HTTPSConnection(host=host,port=port,timeout=10)
    else:
      conn=httplib.HTTPConnection(host=host,port=port,timeout=10)
    conn.request(method="HEAD",url=path,headers=headers)
    response=str(conn.getresponse().status)
    conn.close()
  except socket.gaierror,e:
    response="Socket Error (%d): %s" % (e[0],e[1])
  except StandardError,e:
    if hasattr(e,'getcode') and len(e.getcode()) > 0:
      response=str(e.getcode())
    if hasattr(e, 'message') and len(e.message) > 0:
      response=str(e.message)
    elif hasattr(e, 'msg') and len(e.msg) > 0:
      response=str(e.msg)
    elif type('') == type(e):
      response=e
    else:
      response="Exception occurred without a good error message.  Manually check the URL to see the status.  If it is believed this URL is 100% good then file a issue for a potential bug."
  return response

Not sure why this was downvoted without feedback. It works with HTTP and HTTPS URLs. It uses the HEAD method of HTTP.

Collectives™ on Stack Overflow

What’s the best way to get an HTTP response code from a URL?

8 Answers 8

6 Comments

8 Comments

2 Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

8 Answers 8

6 Comments

8 Comments

2 Comments

1 Comment

1 Comment

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related