Python- timeout when reading a url

Question

I called the following code to visit a url and tried to print the content on that page:

import urllib2
f = urllib2.urlopen("https://www.reaxys.com/reaxys/secured/customset.do?performed=true&action=get_preparations&searchParam=1287039&workflowId=1338317532514&workflowStep=1&clientDateTime=2012-05-29%2015:17")
page = f.read()
print page
f.close()

I'm not sure if the url is accessible everywhere, so the content on that page might not be accessible to everyone.

This page sets a time constraints on how long a user can stay on the page, and after that time, a popup would show up saying the user has reached the timeout.

Here's the problem I bumped into: When I typed the url into a browser, everything opened just fine. But when I tried printing what Python read from that page, Python read the page that would only pop out when the page has reached a timeout.

I don't know what's wrong, is it Python or the website? How can I make Python read the actual content on that page?

Thanks in advance.

Karmel · Accepted Answer · 2012-05-29 20:51:21Z

It appears to be related to cookies being set by the website. If I visit the URL

https://www.reaxys.com/reaxys/secured/customset.do?performed=true&action=get_preparations&searchParam=1287039&workflowId=1338317532514&workflowStep=1

in my browser, I get the same timeout error. If I refresh, the site loads fine. But if I clear my cookies from the site and retry, I get the timeout again. So, I suspect that the site is executed some process that adds a timestamp and checks it before the page is visible, and defaults to a timeout if for some reason the cookie can't be set (as would be the case with a visit from within a Python script).

I would suggest doing an in-depth investigation of the cookies being set (start with the Javascript on that page, which seems to be handling some of the timeout logic), and then try setting cookies from the scraping process as per: http://www.testingreflections.com/node/view/5919 , http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/ , or the like.

(This is in no way intended to condone the scraping of an Elsevier site, as they may come after you and eat your young :) )

Collectives™ on Stack Overflow

Python- timeout when reading a url

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related