0

I have edited this code from here:

import asyncio
import time
from aiohttp import ClientPayloadError
from aiohttp import ClientSession

COUNTER = 1

async def fetch(url, session):

    async with session.get(url) as response:
        delay = response.headers.get("DELAY")
        date = response.headers.get("DATE")
        global COUNTER
        COUNTER +=1
        print("{}. {}:{} with delay {}".format(str(COUNTER), date, response.url, delay))
        try:
            return await response.text()
        except ClientPayloadError:
            print("ERROR: ".format(url))


async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)


async def run():
    urls = [build_url(id) for id in load_ids()]
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(1000)

    # Create client session that will ensure we dont open new connection
    # per each request.
    async with ClientSession(conn_timeout=10000, read_timeout=10000) as session:
        for url in urls:
           #pass Semaphore and session to every GET request
            task = asyncio.ensure_future(bound_fetch(sem, url, session))
            tasks.append(task)

        responses = asyncio.gather(*tasks)
        await responses

def build_url(id):
    url = 'http://api.metagenomics.anl.gov/annotation/sequence/{}?source=Refseq'.format(id)
    return url

def load_ids():
    #in the "real" code I will load a file here and extract the ids
    return """
    mgm4558908.3
    mgm4484962.3
    mgm4734169.3
    mgm4558911.3
    mgm4484983.3
    mgm4558918.3
    """.strip().split()


start = time.clock()
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)
run_time = (start - time.clock())/60
print("this took: {} minutes".format(run_time))

I know I can print the response data using: print(await response.text()) However I'm not into the asynchronous codes and therefore I can't figure out how and where I should open a file and write to it. Because I suppose there is some sort of threading which could cause problems when writing to the same file at the same time (I'm familiar with multiprocessing).

1 Answer 1

1

async is not multiprocessing or threading In your case, you can try smt like this:

with open(file, "w"):
    async for s in run():
        f.write(s)

Also, you can try use aiofiles or curio for file AI/O

Sign up to request clarification or add additional context in comments.

3 Comments

Do you mean replacing this line: asyncio.ensure_future(run()) so for s in asyncio.ensure_future(run()) ?
I mean to use this code instead asyncio.ensure_future(run())
Aah I get it! Now I write the responses to a file, what code can I remove now to avoid that it saves everything in RAM? That was the whole point for me to write it to a file ;)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.