API to trigger series of Python scripts

Question

I have 3 python scripts

grab.py: Takes as input an Instagram account name and outputs a text file containing all of its followers.
scrape.py: Takes as input the output from grab.py and outputs details of each account (follower count, post count etc) in csv form
analyze.py: A basic machine learning model that uses results of scrape.py to perform an analysis on the accounts.

The 3 scripts work as expected individually. The next step is to create an API endpoint which will take an account name as a request parameter, and then trigger the above 3 scripts for the received account. The final analysis results will be stored in a database.

The endpoint also needs to have a queueing mechanism to store account names received. The queue will be polled, and if account names are available, they will be processed sequentially.

My API development experience is limited so I am not sure of the best approach to tackle this problem. My questions are:

Should my API endpoint be written in Python? If yes, is the Flask framework a viable option? If not, what are the other options I have?
Is there some sort of a pipeline that I can use to seamlessly integrate the 3 scripts together?
Is the idea of maintaining the queue in memory and polling it using a separate thread running an infinite while loop a good idea? Is there a better way to accomplish this?

Although a perfect valid question I fear the possible answers are all opinion-based - which makes it off-topic for this Q&A site (opinion-based answers are frowned upon here because the tend to invite flame-wars and trolling). That said, I use Python all the time for APIs (Django REST Framework) and people tend to use an asynchronous task queue like Celery to handle this kind of problem. If the tasks are dependent on each other you may be interested in the Airflow project. — Paulo Scardine
– Paulo Scardine, Commented Mar 13, 2019 at 10:28
1) IMO flask would be a viable and quick option. 2) No idea 3) Since queues can used in a blocking fashion this seems like an okay idea. One drawback is that it is, as you said, in memory and will be lost once the program ends. You could use a database like sqlite or a file instead. — katzenversteher
– katzenversteher, Commented Mar 13, 2019 at 10:29

Morty C-137 · Accepted Answer · 2019-03-13 10:48:37Z

To get info from an API and save it I would recommend using asyncio to do something like

import asyncio
import aiohttp
import time
import aiofiles as aiof

FILENAME = "foo.txt"
loop = asyncio.get_event_loop()

async def fetch(session, url):
    async with session.get(url) as response:
        async with aiof.open(FILENAME, "a") as out:
            out.write((await response.json()))
            out.flush()



async def main():
    instagram-ids = [] #profile ids
    current = time.time()
    url = "INSTAGRAM_API_URL"
    tasks = []
    async with aiohttp.ClientSession() as session:
        for id in instagram-ids:
            tasks.append(loop.create_task(fetch(session, url.format(id))))
        responses = await asyncio.gather(*tasks)
    print(time.time() - current)

loop.run_until_complete(main())

since most of the time when dealing with API is spent on waiting for the results

Collectives™ on Stack Overflow

API to trigger series of Python scripts

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related