20

I'm using html2text in python to get raw text (tags included) of a HTML page by taking any URL but I'm getting an error.

My code -

import html2text
import urllib2

proxy = urllib2.ProxyHandler({'http': 'http://<proxy>:<pass>@<ip>:<port>'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
print html2text.html2text(html)

The error -

Traceback (most recent call last):
  File "t.py", line 8, in <module>
    html = urllib2.urlopen("http://www.ndtv.com/india-news/this-stunt-for-a-facebook-like-got-the-hyderabad-youth-arrested-740851").read()
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 110] Connection timed out>

Can anyone explain what I'm doing wrong?

7
  • This doesn't have anything to do with html2text; it's an error in the URL fetch. Can you load that URL through a browser? Can you just try it again? Network errors like this are often intermittent. Commented Feb 19, 2015 at 16:01
  • yep its working fine on browser ....any other suggestios..?? Commented Feb 19, 2015 at 16:05
  • urllib2.urlopen already gives you the text; that error I don't know. Commented Feb 19, 2015 at 17:50
  • The error means that your script waited a long time but the server didn't say anything. Commented Feb 19, 2015 at 17:51
  • You need to improve your spelling and capitalization. I got banned for it once. Commented Feb 20, 2015 at 13:30

1 Answer 1

17

If you don't require SSL, this script in Python 2.7.x should work:

import urllib
url = "http://stackoverflow.com"
f = urllib.urlopen(url)
print f.read()

and in Python 3.x use urllib.request instead of urllib

Because urllib2 for Python 2, in Python 3 it was merged into urllib.

http:// is required.

EDIT: In 2020, you should use the 3rd party module requests. requests can be installed with pip.

import requests
print(requests.get("http://stackoverflow.com").text)
Sign up to request clarification or add additional context in comments.

1 Comment

sorry but it didnt help it gave the same error....do u have any other soltuion..??

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.