Python Program with urllib Module

Question

Folks

Below program is for finding out the IP address given in the page http://whatismyipaddress.com/

import urllib2
import re

response = urllib2.urlopen('http://whatismyipaddress.com/')

p = response.readlines()
for line in p:
    ip = re.findall(r'(\d+.\d+.\d+.\d+)',line)
    print ip

But I am not able to trouble shoot the issue as it was giving below error

Traceback (most recent call last):
  File "Test.py", line 5, in <module>
  response = urllib2.urlopen('http://whatismyipaddress.com/')
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
  return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
  response = meth(req, response)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
  'http', request, response, code, msg, hdrs)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
  return self._call_chain(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
  result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
  raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

urllib2.HTTPError: HTTP Error 403: Forbidden

anyone have any idea what change is required to remove the errors and get the required output?

They are checking the "User-Agent" header

John La Rooy
– John La Rooy

2015-08-13 06:27:18 +00:00
Commented Aug 13, 2015 at 6:27 — John La Rooy
– John La Rooy, Commented Aug 13, 2015 at 6:27

chris-sc · Accepted Answer · 2015-08-13 06:27:07Z

3

The http error code 403 tells you that the server does not want to respond to your request for some reason. In this case, I think it is the user agent of your query (the default used by urllib2).

You can change the user agent:

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
response = opener.open('http://www.whatismyipaddress.com/')

Then your query will work.

But there is no guarantee that this will keep working. The site could decide to block automated queries.

answered Aug 13, 2015 at 6:27

chris-sc

1,72811 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

John La Rooy Over a year ago

Seems they are using a blacklist. 'wget' is blocked, but 'w' gets through :)

Maverick Over a year ago

@chris it worked... it provided the output... can u explain what exaclty the 1st and 2nd line of ur code does?

Maverick Over a year ago

@chris, p = response.readlines() for line in p: IP = re.finditer(r'(\d+.\d+.\d+.\d+)',line) print IP

Maverick Over a year ago

@chris added above lines to the code u have given..but still not giving the required output.instead it gives <callable-iterator object at 0x102a92d50>

Community · Accepted Answer · 2017-05-23 11:58:00Z

0

Try this

>>> import urllib2
>>> import re
>>> site= 'http://whatismyipaddress.com/'
>>> hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
...        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
...        'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
...        'Accept-Encoding': 'none',
...        'Accept-Language': 'en-US,en;q=0.8',
...        'Connection': 'keep-alive'}
>>> req = urllib2.Request(site, headers=hdr)
>>> response = urllib2.urlopen(req)
>>> p = response.readlines()
>>> for line in p:
...     ip = re.findall(r'(\d+.\d+.\d+.\d+)',line)
...     print ip

urllib2-httperror-http-error-403-forbidden

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Aug 13, 2015 at 6:31

chandu

1,0719 silver badges18 bronze badges

Comments

Chrim · Accepted Answer · 2015-08-14 03:53:46Z

0

You may try the requests package here, instead of the urllib2

it is much easier to use :

import requests
url='http://whereismyip.com'
header = {'user-Agent':'curl/7.21.3'}
r= requests.get(url,header)

you can use curl as the user-Agent

edited Aug 14, 2015 at 3:53

answered Aug 13, 2015 at 6:31

Chrim

1009 bronze badges

2 Comments

Maverick Over a year ago

installed the module requests.and tried to execute below code import requests r = requests.get('whatismyipaddress.com/') print r.text but there is no response for this code

Chrim Over a year ago

@Maverick the url should be a valid url with proper protocol defined, in this case you should provide it with the http://' Try use url = 'http://whatismyipaddress.com/' then r = requests.get(url) if you need to custom header you can pass the header to the get method like headers = {'user-agent': 'Mozilla/5.0'} r=requests.get(url,headers)

Collectives™ on Stack Overflow

Python Program with urllib Module

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related