0

I am using Python requests library to get the header of html pages and use this to get the encoding. But some of the links the requests fails to get header. For such cases I would like to use the encoding "utf-8". How do I handle such cases? How do I handle error returned by requests.head.

Here is my code:

r = requests.head(link) #how to handle error in case this fails?
charset = r.encoding
if (not charset):
    charset = "utf-8"

Error I am getting when requests fails to get the header :

 File "parsexml.py", line 78, in parsefile
  r = requests.head(link)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 74, in head
   return request('head', url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/api.py", line 40, in request
   return s.request(method=method, url=url, **kwargs)
 File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 229, in request
   r.send(prefetch=prefetch)
 File "/usr/lib/python2.7/dist-packages/requests/models.py", line 605, in send
   raise ConnectionError(e)
 requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.standardzilla.com', port=80): Max retries exceeded with url: /2008/08/01/diaries-of-a-freelancer-day-thirty-seven/
4
  • 1
    The server you are trying to connect to doesn't respond at all; I don't think this has anything to do with your HEAD request, really. Commented Feb 18, 2014 at 10:59
  • The domain name www.standardzilla.com doesn't exist. Commented Feb 18, 2014 at 11:00
  • That's what my question is how do I handle such cases? Commented Feb 18, 2014 at 11:01
  • Exception handling; catch the exception and move on. See the posted answer. But that's not really requests specific, let alone anything to do with testing for character sets. :-) Commented Feb 18, 2014 at 11:02

1 Answer 1

2

You should put your code in a try-except block, catching ConnectionErrors. Like this:

try:
    r = requests.head(link) //how to handle error in case this fails?
    charset = r.encoding
    if (not charset):
      charset = "utf-8"
except requests.exceptions.ConnectionError:
    print 'Unable to access ' + link
Sign up to request clarification or add additional context in comments.

10 Comments

@Lanc Great. You can mark the answer as correct if that's the case
Now I am getting this error now, can you please help me with this. File "parsexml.py", line 79, in parsefile r = requests.head(link,timeout=100,allow_redirects=True) File "/usr/lib/python2.7/dist-packages/requests/api.py", line 74, in head return request('head', url, **kwargs) File "/usr/lib/python2.7/dist-packages/requests/api.py", line 40, in request return s.request(method=method, url=url, **kwargs) File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 229, in request r.send(prefetch=prefetch)
File "/usr/lib/python2.7/dist-packages/requests/models.py", line 624, in send self._build_response(r) File "/usr/lib/python2.7/dist-packages/requests/models.py", line 301, in _build_response request.send() File "/usr/lib/python2.7/dist-packages/requests/models.py", line 468, in send url = self.full_url
File "/usr/lib/python2.7/dist-packages/requests/models.py", line 411, in full_url url = requote_uri(url) File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 448, in requote_uri return quote(unquote_unreserved(uri), safe="!#$%&'()*+,/:;=?@[]~") File "/usr/lib/python2.7/dist-packages/requests/utils.py", line 429, in unquote_unreserved c = chr(int(h, 16)) ValueError: invalid literal for int() with base 16: '&e'
@Lanc it's hard to tell. You've added some more code it looks like.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.