0

I am here with another simple question in which I really need some help. I have a list of websites and I want to go through them all with requests. However to make this faster, I want to use multi processing. How can I execute this?

Example:

import requests
import threading
from threading import Thread

list_ex = ["www.google.com","www.bing.com","www.yahoo.com"]
def list_request():
  for item in list_ex:
    ex = request.get(list_ex)
    print(ex.text)

How can I do this but with multi processing due to me having 100+ so sites :)

10
  • Mukti threading in Python wont give you better performance....even could be bad performance because python not really open multi threading it just make pause to one thread run the other and come back to the first one. Commented May 15, 2020 at 2:58
  • Well with this it would if I could run this process on multiple threads it would provide better performance than on one thread due to it iterating more list values at one time. Commented May 15, 2020 at 2:59
  • Did you check this out? stackoverflow.com/questions/38280094/… Commented May 15, 2020 at 2:59
  • Do you mean multiprocessing? Commented May 15, 2020 at 2:59
  • Either multiprocessing or threading. Commented May 15, 2020 at 3:00

2 Answers 2

1
  • you can use DASK for multiprocessing ..

It will uses all your system cores .. while traditional python uses only one core

Dask Official ..... Dask Wikipedia

Sign up to request clarification or add additional context in comments.

1 Comment

Using dask probably is overkill in this case, though. For simple parallelization, the standard library provides multithreading and multiprocessing facilities.
0

Multithreading is an option here, because fetching a url is not CPU-bound, but I/O-bound. Thus, a single process can have multiple threads running requests.get in parallel.

import requests
from multiprocessing.pool import ThreadPool

def read_url(url):
    return requests.get(url).text


urls = ["www.google.com","www.bing.com","www.yahoo.com"]
with ThreadPool() as pool:
    texts = pool.map(read_url, urls)

You can use multiprocessing by replacing ThreadPool with Pool. For the three provided URLs, I get similar runtimes with Pool and ThreadPool, and both are faster than running read_url sequentially in a loop.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.