Python urllib2 returning an empty string

Question

I'm trying to retrieve the following URL: http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004.

import urllib2
response = urllib2.urlopen('http://www.winkworth.co.uk/rent/property/terraced-house-to-rent-in-mill-road--/WOT140129')
response.read()

However I'm getting an empty string. When I try it through the browser or with cURL it works fine. Any ideas what's going on?

Is urlopen asynchronous? If so, maybe it isn't finished downloading when you try to read it? — BlackVegetable
– BlackVegetable, Commented Jan 23, 2015 at 21:01
@BlackVegetable: nope, urlopen is synchronous. It is the server being broken by not returning anything when no Accept header is present. — Martijn Pieters
– Martijn Pieters, Commented Jan 23, 2015 at 21:06

Community · Accepted Answer · 2021-10-07 05:49:19Z

12

I got a response when using the requests library but not when using urllib2, so I experimented with HTTP request headers.

As it turns out, the server expects an Accept header; urllib2 doesn't send one, requests and cURL send */*.

Send one with urllib2 as well:

url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
req = urllib2.Request(url, headers={'accept': '*/*'})
response = urllib2.urlopen(req)

Demo:

>>> import urllib2
>>> url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
>>> len(urllib2.urlopen(url).read())
0
>>> request = urllib2.Request(url, headers={'accept': '*/*'})
>>> len(urllib2.urlopen(request).read())
37197

The server is at fault here; RFC 2616 states:

If no Accept header field is present, then it is assumed that the client accepts all media types.

edited Oct 7, 2021 at 5:49

CommunityBot

11 silver badge

answered Jan 23, 2015 at 21:06

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python urllib2 returning an empty string

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related