I have the following code:
with ThreadPoolExecutor(max_workers=num_of_pages) as executor:
futh = [(executor.submit(self.getdata2, page, hed, data, apifolder,additional)) for page in pages]
for data in as_completed(futh):
datarALL = datarALL + data.result()
return datarALL
The num_of_pages isn't fixed but usualy it's around 250.
getdata2 func creates GET requests and return each page results:
The problem is that all 250 pages (threads) are created together. which means 250 GET requests which are called at the same time. This cause overload in the server so I get alot of retries due to delayed server response which shuts down the GET call and retry it. I want to avoid it.
I thought of creating some sort of lock which will prevent the thread/page from creating the GET request if there are more than 10 active requests. In such case it will wait till a slot is available.
Some thing like:
executing_now = []
def getdata2(...)
...
while len(executing_now)>10:
sleep(10)
executing_now.append(page)
response = requests.get(url, data=data, headers=hed, verify=False)
....
executing_now.remove(page)
return ...
Is there an existed mechanism for this in Python? This requires the threads to check for a shared memory... I want to avoid the multi threading problems such as deadlock etc..
Basically warp the GET call with a limit of how many threads can execute it on the same time.