Python requests module multithreading

Question

Is there a possible way to speed up my code using multiprocessing interface? The problem is that this interface uses map function, which works only with 1 function. But my code has 3 functions. I tried to combine my functions into one, but didn't get success. My script reads the URL of site from file and performs 3 functions over it. For Loop makes it very slow, because I got a lot of URLs

import requests

def Login(url): #Log in     
    payload = {
        'UserName_Text'     : 'user',
        'UserPW_Password'   : 'pass',
        'submit_ButtonOK'   : 'return buttonClick;'  
      }

    try:
        p = session.post(url+'/login.jsp', data = payload, timeout=10)
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
        print "site is DOWN! :", url[8:]
        session.cookies.clear()
        session.close() 
    else:
        print 'OK: ', p.url

def Timer(url): #Measure request time
    try:
        timer = requests.get(url+'/login.jsp').elapsed.total_seconds()
    except (requests.exceptions.ConnectionError):
        print 'Request time: None'
        print '-----------------------------------------------------------------'
    else: 
        print 'Request time:', round(timer, 2), 'sec'

def Logout(url): # Log out
    try:
        logout = requests.get(url+'/logout.jsp', params={'submit_ButtonOK' : 'true'}, cookies = session.cookies)
    except(requests.exceptions.ConnectionError):
        pass
    else:
        print 'Logout '#, logout.url
        print '-----------------------------------------------------------------'
        session.cookies.clear()
        session.close()
for line in open('text.txt').read().splitlines():
    session = requests.session()
    Login(line)
    Timer(line)
    Logout(line)

Messa · Accepted Answer · 2015-02-08 14:50:29Z

5

Yes, you can use multiprocessing.

from multiprocessing import Pool

def f(line):
    session = requests.session()
    Login(session, line)
    Timer(session, line)
    Logout(session, line)        

if __name__ == '__main__':
    urls = open('text.txt').read().splitlines()
    p = Pool(5)
    print(p.map(f, urls))

The requests session cannot be global and shared between workers, every worker should use its own session.

You write that you already "tried to combine my functions into one, but didn't get success". What exactly didn't work?

edited Feb 8, 2015 at 14:50

answered Feb 8, 2015 at 12:09

Messa

25.4k10 gold badges77 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bogdan Bratkiv Over a year ago

Well, you did this more elegant in your example.

Bogdan Bratkiv Over a year ago

My script stucks.. So what should I insert where is '[...]' in your code? This doesn't work: for line in open('text.txt').read().splitlines(): print p.map(f, line)

Messa Over a year ago

@BogdanBratkiv I've updated the anwser: urls = open('text.txt').read().splitlines(), then use p.map(f, urls)

jfs Over a year ago

@BogdanBratkiv: you could use multiprocessing.dummy, to use threads instead of processes and/or gevent if you want green "threads" with the same code (ignore text, just look at the code) -- note: it is trivial to switch between processes (if you have CPU-intensive tasks), threads (I/O based tasks), greenlets (single OS thread, async. I/O) in this case.

Guido · Accepted Answer · 2015-02-08 12:09:52Z

1

There are many ways to accomplish your task, but multiprocessing is not needed at that level, it will just add complexity, imho.

Take a look at gevent, greenlets and monkey patching, instead!

Once your code is ready, you can wrap a main function into a gevent loop, and if you applied the monkey patches, the gevent framework will run N jobs concurrently (you can create a jobs pool, set the limits of concurrency, etc.)

This example should help:

#!/usr/bin/python
# Copyright (c) 2009 Denis Bilenko. See LICENSE for details.

"""Spawn multiple workers and wait for them to complete"""
from __future__ import print_function
import sys

urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()


if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib2 import urlopen


def print_head(url):
    print('Starting %s' % url)
    data = urlopen(url).read()
    print('%s: %s bytes: %r' % (url, len(data), data[:50]))

jobs = [gevent.spawn(print_head, url) for url in urls]

gevent.wait(jobs)

You can find more here and in the Github repository, from where this example comes from

P.S. Greenlets will works with requests as well, you don't need to change your code.

edited Feb 8, 2015 at 12:09

answered Feb 8, 2015 at 12:01

Guido

3192 silver badges9 bronze badges

4 Comments

Messa Over a year ago

So multiprocessing is complex, but gevent not? ok :)

Guido Over a year ago

Imho yes, when I wasn't aware of many things, I've found so easy understanding gevent, instead of multiprocessing pools management. Gevent did most of the job for me, but is just a personal opinion :-)

jfs Over a year ago

are you sure that gevent is compatible with requests library? I remember that at least some past requests versions failed to work as is with gevent.

Guido Over a year ago

I confirm that gevent 1.0.1 works with requests 2.2.1, on python 2.7, once applied a rude monkey.patch_all(). I also remember about those problems about 1 year ago. @J.F.Sebastian

Collectives™ on Stack Overflow

Python requests module multithreading

2 Answers 2

4 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related