Write data to file in combination with asynchronous http requests

Question

I have edited this code from here:

import asyncio
import time
from aiohttp import ClientPayloadError
from aiohttp import ClientSession

COUNTER = 1

async def fetch(url, session):

    async with session.get(url) as response:
        delay = response.headers.get("DELAY")
        date = response.headers.get("DATE")
        global COUNTER
        COUNTER +=1
        print("{}. {}:{} with delay {}".format(str(COUNTER), date, response.url, delay))
        try:
            return await response.text()
        except ClientPayloadError:
            print("ERROR: ".format(url))


async def bound_fetch(sem, url, session):
    # Getter function with semaphore.
    async with sem:
        await fetch(url, session)


async def run():
    urls = [build_url(id) for id in load_ids()]
    tasks = []
    # create instance of Semaphore
    sem = asyncio.Semaphore(1000)

    # Create client session that will ensure we dont open new connection
    # per each request.
    async with ClientSession(conn_timeout=10000, read_timeout=10000) as session:
        for url in urls:
           #pass Semaphore and session to every GET request
            task = asyncio.ensure_future(bound_fetch(sem, url, session))
            tasks.append(task)

        responses = asyncio.gather(*tasks)
        await responses

def build_url(id):
    url = 'http://api.metagenomics.anl.gov/annotation/sequence/{}?source=Refseq'.format(id)
    return url

def load_ids():
    #in the "real" code I will load a file here and extract the ids
    return """
    mgm4558908.3
    mgm4484962.3
    mgm4734169.3
    mgm4558911.3
    mgm4484983.3
    mgm4558918.3
    """.strip().split()


start = time.clock()
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)
run_time = (start - time.clock())/60
print("this took: {} minutes".format(run_time))

I know I can print the response data using: print(await response.text()) However I'm not into the asynchronous codes and therefore I can't figure out how and where I should open a file and write to it. Because I suppose there is some sort of threading which could cause problems when writing to the same file at the same time (I'm familiar with multiprocessing).

amarynets · Accepted Answer · 2017-09-27 19:29:04Z

1

async is not multiprocessing or threading In your case, you can try smt like this:

with open(file, "w"):
    async for s in run():
        f.write(s)

Also, you can try use aiofiles or curio for file AI/O

answered Sep 27, 2017 at 19:29

amarynets

1,82511 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

CodeNoob Over a year ago

Do you mean replacing this line: asyncio.ensure_future(run()) so for s in asyncio.ensure_future(run()) ?

amarynets Over a year ago

I mean to use this code instead asyncio.ensure_future(run())

CodeNoob Over a year ago

Aah I get it! Now I write the responses to a file, what code can I remove now to avoid that it saves everything in RAM? That was the whole point for me to write it to a file ;)

Collectives™ on Stack Overflow

Write data to file in combination with asynchronous http requests

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related