2

I'm trying to retrieve the following URL: http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004.

import urllib2
response = urllib2.urlopen('http://www.winkworth.co.uk/rent/property/terraced-house-to-rent-in-mill-road--/WOT140129')
response.read()

However I'm getting an empty string. When I try it through the browser or with cURL it works fine. Any ideas what's going on?

2
  • Is urlopen asynchronous? If so, maybe it isn't finished downloading when you try to read it? Commented Jan 23, 2015 at 21:01
  • @BlackVegetable: nope, urlopen is synchronous. It is the server being broken by not returning anything when no Accept header is present. Commented Jan 23, 2015 at 21:06

1 Answer 1

12

I got a response when using the requests library but not when using urllib2, so I experimented with HTTP request headers.

As it turns out, the server expects an Accept header; urllib2 doesn't send one, requests and cURL send */*.

Send one with urllib2 as well:

url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
req = urllib2.Request(url, headers={'accept': '*/*'})
response = urllib2.urlopen(req)

Demo:

>>> import urllib2
>>> url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
>>> len(urllib2.urlopen(url).read())
0
>>> request = urllib2.Request(url, headers={'accept': '*/*'})
>>> len(urllib2.urlopen(request).read())
37197

The server is at fault here; RFC 2616 states:

If no Accept header field is present, then it is assumed that the client accepts all media types.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.