0

As the title says - is it possible to write an asyncio event loop that will slice DataFrame by unique values in a certain column and save it on my drive? And maybe more importantly - is it faster?

What I've tried is something like this:

async def a_split(dist,df):
    temp_df = df[df.district == dist]
    await temp_df.to_csv('{}.csv'.format(d))

async def m_lp(df):
    for dist in df.district.unique().tolist():
        await async_slice(dist,df)

loop = asyncio.get_event_loop()

loop.run_until_complete(m_lp(dfTotal))  
loop.close() 

But I'm getting a following error:

TypeError: object NoneType can't be used in 'await' expression

If it's not obvious from my attempt, I'm very new to asyncio and I'm not sure how it works. Apologies if this is a stupid question.

If asyncio is not a good tool for the job - is there a better one?

Edit:

Full traceback below:

    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-2bc2373d2920> in <module>()
      2 loop = asyncio.get_event_loop()
      3 
----> 4 loop.run_until_complete(m_lp(dfTotal))
      5 loop.close()

C:\Users\5157213\AppData\Local\Continuum\Anaconda3\envs\python36\lib\asyncio\base_events.py in run_until_complete(self, future)
    464             raise RuntimeError('Event loop stopped before Future completed.')
    465 
--> 466         return future.result()
    467 
    468     def stop(self):

<ipython-input-20-9e91c0b1b06f> in m_lp(df)
      1 async def m_lp(df):
      2     for dist in df.district.unique().tolist():
----> 3         await a_split(dist,df)

<ipython-input-18-200b08417159> in a_split(dist, df)
      1 async def a_split(dist,df):
      2     temp = df[df.district == dist]
----> 3     await temp.to_csv('C:/Users/5157213/Desktop/Portfolio/{}.csv'.format(dist))

TypeError: object NoneType can't be used in 'await' expression
2
  • Please edit the question to include the full traceback. As it stands we can't tell which await that refers to. Commented Jul 17, 2017 at 22:23
  • Edited - it looks like it refers to the await next to df.to_csv line, but neither of the await's return anything Commented Jul 17, 2017 at 22:42

1 Answer 1

3

As far as I know there is no asyncio support as such in Pandas. I think the single-threaded event-based architecture is not the best tool in the systems where you have a dozens of other options to work with load/large data ie. for a large dataset take a look on dask.

The error you get is because you tried to await function Dataframe.to_csv that does not return Future (or any other awaitable object), but the None.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.