1

Is there a possible way to speed up my code using multiprocessing interface? The problem is that this interface uses map function, which works only with 1 function. But my code has 3 functions. I tried to combine my functions into one, but didn't get success. My script reads the URL of site from file and performs 3 functions over it. For Loop makes it very slow, because I got a lot of URLs

import requests

def Login(url): #Log in     
    payload = {
        'UserName_Text'     : 'user',
        'UserPW_Password'   : 'pass',
        'submit_ButtonOK'   : 'return buttonClick;'  
      }

    try:
        p = session.post(url+'/login.jsp', data = payload, timeout=10)
    except (requests.exceptions.ConnectionError, requests.exceptions.Timeout):
        print "site is DOWN! :", url[8:]
        session.cookies.clear()
        session.close() 
    else:
        print 'OK: ', p.url

def Timer(url): #Measure request time
    try:
        timer = requests.get(url+'/login.jsp').elapsed.total_seconds()
    except (requests.exceptions.ConnectionError):
        print 'Request time: None'
        print '-----------------------------------------------------------------'
    else: 
        print 'Request time:', round(timer, 2), 'sec'

def Logout(url): # Log out
    try:
        logout = requests.get(url+'/logout.jsp', params={'submit_ButtonOK' : 'true'}, cookies = session.cookies)
    except(requests.exceptions.ConnectionError):
        pass
    else:
        print 'Logout '#, logout.url
        print '-----------------------------------------------------------------'
        session.cookies.clear()
        session.close()
for line in open('text.txt').read().splitlines():
    session = requests.session()
    Login(line)
    Timer(line)
    Logout(line)

2 Answers 2

5

Yes, you can use multiprocessing.

from multiprocessing import Pool

def f(line):
    session = requests.session()
    Login(session, line)
    Timer(session, line)
    Logout(session, line)        

if __name__ == '__main__':
    urls = open('text.txt').read().splitlines()
    p = Pool(5)
    print(p.map(f, urls))

The requests session cannot be global and shared between workers, every worker should use its own session.

You write that you already "tried to combine my functions into one, but didn't get success". What exactly didn't work?

Sign up to request clarification or add additional context in comments.

4 Comments

Well, you did this more elegant in your example.
My script stucks.. So what should I insert where is '[...]' in your code? This doesn't work: for line in open('text.txt').read().splitlines(): print p.map(f, line)
@BogdanBratkiv I've updated the anwser: urls = open('text.txt').read().splitlines(), then use p.map(f, urls)
@BogdanBratkiv: you could use multiprocessing.dummy, to use threads instead of processes and/or gevent if you want green "threads" with the same code (ignore text, just look at the code) -- note: it is trivial to switch between processes (if you have CPU-intensive tasks), threads (I/O based tasks), greenlets (single OS thread, async. I/O) in this case.
1

There are many ways to accomplish your task, but multiprocessing is not needed at that level, it will just add complexity, imho.

Take a look at gevent, greenlets and monkey patching, instead!

Once your code is ready, you can wrap a main function into a gevent loop, and if you applied the monkey patches, the gevent framework will run N jobs concurrently (you can create a jobs pool, set the limits of concurrency, etc.)

This example should help:

#!/usr/bin/python
# Copyright (c) 2009 Denis Bilenko. See LICENSE for details.

"""Spawn multiple workers and wait for them to complete"""
from __future__ import print_function
import sys

urls = ['http://www.google.com', 'http://www.yandex.ru', 'http://www.python.org']

import gevent
from gevent import monkey

# patches stdlib (including socket and ssl modules) to cooperate with other greenlets
monkey.patch_all()


if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    from urllib2 import urlopen


def print_head(url):
    print('Starting %s' % url)
    data = urlopen(url).read()
    print('%s: %s bytes: %r' % (url, len(data), data[:50]))

jobs = [gevent.spawn(print_head, url) for url in urls]

gevent.wait(jobs)

You can find more here and in the Github repository, from where this example comes from

P.S. Greenlets will works with requests as well, you don't need to change your code.

4 Comments

So multiprocessing is complex, but gevent not? ok :)
Imho yes, when I wasn't aware of many things, I've found so easy understanding gevent, instead of multiprocessing pools management. Gevent did most of the job for me, but is just a personal opinion :-)
are you sure that gevent is compatible with requests library? I remember that at least some past requests versions failed to work as is with gevent.
I confirm that gevent 1.0.1 works with requests 2.2.1, on python 2.7, once applied a rude monkey.patch_all(). I also remember about those problems about 1 year ago. @J.F.Sebastian

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.