How to get HTML content of 404 error page using python?

Question

I am using python to get HTML data from multiple pages at a URL. I found that urllib throws an exception when a URL does not exist. How do I retrieve the HTML of that custom 404 error page (the page where it says something like "Page is not found.")

Current code:

try:
    req = Request(URL, headers={'User-Agent': 'Mozilla/5.0'})
    client = urlopen(req)

    #downloading html data
    page_html = client.read()

    #closing connection
    client.close()
except:
    print("The following URL was not found. Program terminated.\n" + URL)
    break

See HTTPError. It has a .read() method which returns the response content. — t.m.adam
– t.m.adam, Commented Nov 4, 2018 at 10:07

Derwent · Accepted Answer · 2018-11-04 02:03:34Z

2

Have you tried the requests library?

Just install the library with pip

pip install requests

And use it like this

import requests

response = requests.get('https://stackoverflow.com/nonexistent_path')
print(response.status_code) # 404
print(response.text) # Prints the raw HTML response

answered Nov 4, 2018 at 2:03

Derwent

6351 gold badge6 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Malady · Accepted Answer · 2022-06-22 03:18:04Z

0

To preserve the comment that also answers the question, and also because it's what I was looking for, a way to do this without going outside urllib:

By t.m.adam at Nov 4, 2018 at 10:07

See HTTPError. It has a .read() method which returns the response content. –

answered Jun 22, 2022 at 3:18

Malady

2751 silver badge16 bronze badges

Collectives™ on Stack Overflow

How to get HTML content of 404 error page using python?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related