2

When I download a file using single request I do the following:

session = requests.Session()
params = {'fd': 1, 'count': 1024, 'auth': 'auth_token'}
r = session.get('https://httpbin.org/bytes/9', params=params)

print(r.content)
# b'\xb3_\\l\xe2\xbf/:\x07'

How can I do multiple requests without waiting for an answer?

Server api docs says:

You can push multiple requests over single connection without waiting for answer, to improve performance. The server will process the requests in the order they are received and you are guaranteed to receive answers in the same order. It is important however to send all requests with "Connection: keep-alive", otherwise the API server will close the connection without processing the pending requests.

They are talking about one thread and multiple requests without waiting for an answer. I suppose it is called HTTP pipelining.

How can I do this with Python Requests library?

A similar answer suggests using parallel calls which is not the case for my question. It also says: "requests does pool connections, keeping the TCP connection open". How can I implement this?

Can I use any other synchronous library, if it's not possible for requests?

9
  • you can use aiohttp library Commented Jul 25, 2020 at 12:30
  • I suppose they are talking about HTTP pipelining. Commented Jul 25, 2020 at 12:31
  • Does this answer your question? Pipelining POST requests with python-requests Commented Jul 25, 2020 at 12:33
  • @Klaus D. Thanks. But this answer was written 5 years ago. Are you sure that there is no solution for this? Commented Jul 25, 2020 at 12:37
  • 1
    Sorry, my answer there still stands; a draft HTTP pipelining pull request for urllib3 was closed 2 years ago, so there doesn't appear to be an option in sight if you want to use requests. Commented Jul 25, 2020 at 15:41

1 Answer 1

2

You can get several pages in parallel, without threads. It exploits HTTP pipelining by resetting the state(private variable!) of HTTPSConnection to trick it into sending the next request ahead of time.

from http.client import HTTPSConnection, _CS_IDLE
from urllib.parse import urlparse, urlunparse


def pipeline(host, pages, max_out_bound=4, debuglevel=0):
    page_count = len(pages)
    conn = HTTPSConnection(host)
    conn.set_debuglevel(debuglevel)
    responses = [None] * page_count
    finished = [False] * page_count
    content = [None] * page_count
    headers = {'Host': host, 'Content-Length': 0, 'Connection': 'Keep-Alive'}

    while not all(finished):
        # Send
        out_bound = 0
        for i, page in enumerate(pages):
            if out_bound >= max_out_bound:
                break
            elif page and not finished[i] and responses[i] is None:
                if debuglevel > 0:
                    print('Sending request for %r...' % (page,))
                conn._HTTPConnection__state = _CS_IDLE  # private variable!
                conn.request("GET", page, None, headers)
                responses[i] = conn.response_class(conn.sock, method=conn._method)
                out_bound += 1
        # Try to read a response
        for i, resp in enumerate(responses):
            if resp is None:
                continue
            if debuglevel > 0:
                print('Retrieving %r...' % (pages[i],))
            out_bound -= 1
            skip_read = False
            resp.begin()
            if debuglevel > 0:
                print('    %d %s' % (resp.status, resp.reason))
            if 200 <= resp.status < 300:
                # Ok
                content[i] = resp.read()
                cookie = resp.getheader('Set-Cookie')
                if cookie is not None:
                    headers['Cookie'] = cookie
                skip_read = True
                finished[i] = True
                responses[i] = None
            elif 300 <= resp.status < 400:
                # Redirect
                loc = resp.getheader('Location')
                responses[i] = None
                parsed = loc and urlparse(loc)
                if not parsed:
                    # Missing or empty location header
                    content[i] = (resp.status, resp.reason)
                    finished[i] = True
                elif parsed.netloc != '' and parsed.netloc != host:
                    # Redirect to another host
                    content[i] = (resp.status, resp.reason, loc)
                    finished[i] = True
                else:
                    path = urlunparse(parsed._replace(scheme='', netloc='', fragment=''))
                    if debuglevel > 0:
                        print('  Updated %r to %r' % (pages[i], path))
                    pages[i] = path
            elif resp.status >= 400:
                # Failed
                content[i] = (resp.status, resp.reason)
                finished[i] = True
                responses[i] = None
            if resp.will_close:
                # Connection (will be) closed, need to resend
                conn.close()
                if debuglevel > 0:
                    print('  Connection closed')
                for j, f in enumerate(finished):
                    if not f and responses[j] is not None:
                        if debuglevel > 0:
                            print('  Discarding out-bound request for %r' % (pages[j],))
                        responses[j] = None
                break
            elif not skip_read:
                resp.read()  # read any data
            if any(not f and responses[j] is None for j, f in enumerate(finished)):
                # Send another pending request
                break
        else:
            break  # All responses are None?
    return content


if __name__ == '__main__':
    domain = 'en.wikipedia.org'
    pages = ['/wiki/HTTP_pipelining', '/wiki/HTTP', '/wiki/HTTP_persistent_connection']
    data = pipeline(domain, pages, max_out_bound=3, debuglevel=1)
    for i, page in enumerate(data):
        print()
        print('==== Page %r ====' % (pages[i],))
        print(page[:512])
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you! Can I use existing connection that was created with requests.Session() instead of creating a new connection conn = HTTPSConnection(host)?
No, you can use only HTTPSConnection (or HTTPConnection), because this code uses it’s private variable (state). And you should definitely check if it suits your tasks
I thought that requests.Session creates an HTTPSConnection under the hood. Is it wrong?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.