0

I have 3 python scripts

  1. grab.py: Takes as input an Instagram account name and outputs a text file containing all of its followers.
  2. scrape.py: Takes as input the output from grab.py and outputs details of each account (follower count, post count etc) in csv form
  3. analyze.py: A basic machine learning model that uses results of scrape.py to perform an analysis on the accounts.

The 3 scripts work as expected individually. The next step is to create an API endpoint which will take an account name as a request parameter, and then trigger the above 3 scripts for the received account. The final analysis results will be stored in a database.

The endpoint also needs to have a queueing mechanism to store account names received. The queue will be polled, and if account names are available, they will be processed sequentially.

My API development experience is limited so I am not sure of the best approach to tackle this problem. My questions are:

  1. Should my API endpoint be written in Python? If yes, is the Flask framework a viable option? If not, what are the other options I have?
  2. Is there some sort of a pipeline that I can use to seamlessly integrate the 3 scripts together?
  3. Is the idea of maintaining the queue in memory and polling it using a separate thread running an infinite while loop a good idea? Is there a better way to accomplish this?
2
  • 1
    Although a perfect valid question I fear the possible answers are all opinion-based - which makes it off-topic for this Q&A site (opinion-based answers are frowned upon here because the tend to invite flame-wars and trolling). That said, I use Python all the time for APIs (Django REST Framework) and people tend to use an asynchronous task queue like Celery to handle this kind of problem. If the tasks are dependent on each other you may be interested in the Airflow project. Commented Mar 13, 2019 at 10:28
  • 1
    1) IMO flask would be a viable and quick option. 2) No idea 3) Since queues can used in a blocking fashion this seems like an okay idea. One drawback is that it is, as you said, in memory and will be lost once the program ends. You could use a database like sqlite or a file instead. Commented Mar 13, 2019 at 10:29

1 Answer 1

1

To get info from an API and save it I would recommend using asyncio to do something like

import asyncio
import aiohttp
import time
import aiofiles as aiof

FILENAME = "foo.txt"
loop = asyncio.get_event_loop()

async def fetch(session, url):
    async with session.get(url) as response:
        async with aiof.open(FILENAME, "a") as out:
            out.write((await response.json()))
            out.flush()



async def main():
    instagram-ids = [] #profile ids
    current = time.time()
    url = "INSTAGRAM_API_URL"
    tasks = []
    async with aiohttp.ClientSession() as session:
        for id in instagram-ids:
            tasks.append(loop.create_task(fetch(session, url.format(id))))
        responses = await asyncio.gather(*tasks)
    print(time.time() - current)

loop.run_until_complete(main())

since most of the time when dealing with API is spent on waiting for the results

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.