3

I'm fetching a batch of urls using the Python Requests module. I first want to read their headers only, to get the actual url and size of response. Then I get the actual content for any that pass muster.

So I use 'streams=True' to delay getting the content. This generally works fine.

But I'm encountering an occasional url that doesn't respond. So I put in timeout=3.

But those never time out. They just hang. If I remove the 'streams=True' it times out correctly. Is there some reason streams and timeout shouldn't work together? Removing the streams=True forces me to get all the content.

Doing this:

import requests
url = 'http://bit.ly/1pQH0o2'
x = requests.get(url) # hangs
x = requests.get(url, stream=True) # hangs
x = requests.get(url, stream=True, timeout=1) # hangs
x = requests.get(url, timeout=3) # times out correctly after 3 seconds
3
  • Which requests version are you using? Commented Jan 18, 2015 at 21:55
  • Using requests 2.5.1, x = requests.get(url) works just fine for me. Commented Jan 18, 2015 at 21:58
  • ah I'm using 2.2.1 , will upgrade thx Commented Jan 18, 2015 at 21:59

3 Answers 3

6

There was a relevant github issue:

The fix was included into requests==2.3.0 version.

Tested it using the latest version - worked for me.

Sign up to request clarification or add additional context in comments.

Comments

4

Do you close your responses? Unclosed and partially read responses can make multiple connections to the same resource and site may have connection limit for a single IP.

1 Comment

I have not been but will. tx
0

I have a solution inspired in this comment: ...requests/issues/1803#...30869031

from concurrent.futures import ThreadPoolExecutor


with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(requests.get, url, stream=True)
    response = future.result(timeout=3)  # 3 secs of timeout

The same method should be useful to set a timeout limit when you want iterate over response.iter_content

generator = response.iter_content(chunk_size=1_000_000)  # 1 MB
chunks = list()
while(True):
    try:
        with ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(next, generator)
            chunk = future.result(timeout=10)  # 10 secs of timeout
    except StopIteration:
        break
    chunks.append(chunk)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.