Python Requests module doesn't handle timeout if streams=True?

Question

I'm fetching a batch of urls using the Python Requests module. I first want to read their headers only, to get the actual url and size of response. Then I get the actual content for any that pass muster.

So I use 'streams=True' to delay getting the content. This generally works fine.

But I'm encountering an occasional url that doesn't respond. So I put in timeout=3.

But those never time out. They just hang. If I remove the 'streams=True' it times out correctly. Is there some reason streams and timeout shouldn't work together? Removing the streams=True forces me to get all the content.

Doing this:

import requests
url = 'http://bit.ly/1pQH0o2'
x = requests.get(url) # hangs
x = requests.get(url, stream=True) # hangs
x = requests.get(url, stream=True, timeout=1) # hangs
x = requests.get(url, timeout=3) # times out correctly after 3 seconds

Using requests 2.5.1, x = requests.get(url) works just fine for me. — MattDMo
– MattDMo, Commented Jan 18, 2015 at 21:58

MrD · Accepted Answer · 2022-08-07 21:19:37Z

6

There was a relevant github issue:

Timeouts do not occur when stream == True

The fix was included into requests==2.3.0 version.

Tested it using the latest version - worked for me.

edited Aug 7, 2022 at 21:19

MrD

5757 silver badges24 bronze badges

answered Jan 18, 2015 at 22:01

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

bav · Accepted Answer · 2015-01-18 21:58:23Z

4

Do you close your responses? Unclosed and partially read responses can make multiple connections to the same resource and site may have connection limit for a single IP.

answered Jan 18, 2015 at 21:58

bav

1,63315 silver badges13 bronze badges

1 Comment

Brad Over a year ago

I have not been but will. tx

eEmanuel Oviedo · Accepted Answer · 2024-10-26 01:01:14Z

0

I have a solution inspired in this comment: ...requests/issues/1803#...30869031

from concurrent.futures import ThreadPoolExecutor


with ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(requests.get, url, stream=True)
    response = future.result(timeout=3)  # 3 secs of timeout

The same method should be useful to set a timeout limit when you want iterate over response.iter_content

generator = response.iter_content(chunk_size=1_000_000)  # 1 MB
chunks = list()
while(True):
    try:
        with ThreadPoolExecutor(max_workers=1) as executor:
            future = executor.submit(next, generator)
            chunk = future.result(timeout=10)  # 10 secs of timeout
    except StopIteration:
        break
    chunks.append(chunk)

edited Oct 26, 2024 at 1:01

answered Oct 26, 2024 at 0:45

eEmanuel Oviedo

695 bronze badges

Collectives™ on Stack Overflow

Python Requests module doesn't handle timeout if streams=True?

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related