1

I am using mechanize to parse html of website, but with this website i got strange result.

from mechanize import Browser
br = Browser()
r = br.open("http://www.heavenplaza.com")
result = r.read()

result is something which i can not understand. you can see here: http://paste2.org/p/1556077

Anyone can have some method to get that website HTML? with mechanize or urllib.

Thanks

1
  • 1
    Please post the result in the answer rather than in a pastebin. Especially when the result is one-line long! Commented Aug 1, 2011 at 13:47

2 Answers 2

1
import urllib2, StringIO, gzip
f = urllib2.urlopen("http://www.heavenplaza.com")
data = StringIO.StringIO(f.read())
gzipper = gzip.GzipFile(fileobj=data)
print gzipper.read()
Sign up to request clarification or add additional context in comments.

Comments

1

I quickly checked the script in the console and the site was returning crap. You probably need to spoof your HTTP user agent to be something else that the site doesn't think you are using a robot.

http://www.google.com works

2 Comments

This is my user-Agent: br.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110420 Firefox/3.6.17')] and it is not work too.
Based on the reply above the site does not correctly honour/use accept-ending gzip headers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.