2

I can easily convert a string to date in pandas as shown here...

df.date = pd.to_datetime(df.date, format="%m/%d/%Y")

There seems to be no easy way in dask?

Here is the pandas example that works with dates:

import pandas as pd

url="http://web.mta.info/developers/data/nyct/turnstile/turnstile_170128.txt"
df=pd.read_csv(url)

df.info()

df.columns=['ca', 'unit', 'scp', 'station', 'inename', 'division', 'date', 'time', 'desc', 'entries', 'exits']

df.date = pd.to_datetime(df.date, format="%m/%d/%Y")

And here is dask that works but can not convert string:

link = 'http://web.mta.info/developers/'

data = ['data/nyct/turnstile/turnstile_170128.txt',
                        'data/nyct/turnstile/turnstile_170121.txt',
                        'data/nyct/turnstile/turnstile_170114.txt',
                        'data/nyct/turnstile/turnstile_170107.txt' 
        ]

urls=[]
for i in data:
    urls.append(link+i)

import pandas as pd
import dask
import dask.dataframe as dd

ddfs = [dask.delayed(pd.read_csv)(url) for url in urls]

ddf = dd.from_delayed(ddfs)

ddf.columns=['ca', 'unit', 'scp', 'station', 'inename', 'division', 'date', 'time', 'desc', 'entries', 'exits']

How do I convert the string to date?

1 Answer 1

3

Edit

This has been added to Dask dataframe

dd.to_datetime(...)

Previous answer

Do this with the parse_dates= keyword to pd.read_csv

ddfs = [dask.delayed(pd.read_csv)(url, parse_dates=['DATE']) for url in urls]

Or you can even combine the DATE and TIME columns in your original data to a single column

ddfs = [dask.delayed(pd.read_csv)(url, parse_dates={'DATETIME': ['DATE', 'TIME']}) for url in urls]

Use map_partitions

If you have a dataframe with an object dtype column you can always use map_partitions to apply a pandas function to every partition. You should also give map partitions the expected type of the output.

ddf['date'] = ddf['date'].map_partitions(pd.to_datetime, format='%m/%d/%Y',
                                         meta=('date', 'M8[ns]'))

This is generally a good way to cover Pandas functionality for which there is no dask.dataframe API.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.