1

I try to read a CSV from the link https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv

df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/challenge/traffic.csv', parse_dates=['time'])

But, the time column is still in string format.

df.dtypes 
[output]
ip        object
time      object
path      object
status     int64
size       int64
dtype: object

Interestingly, when I read a similar csv from a different url, it works. So

df = pd.read_csv('https://raw.githubusercontent.com/LinkedInLearning/data_cleaning_python_2883183/main/Ch04/solution/traffic.csv', parse_dates=['time'])

indeed converts the time column to a datetime object. Why does parse_dates fail in the first link and how can I fix it?

1
  • 1
    It's a data cleaning exercise - so investigate the data after this initial pass - what's in the column now - which types are the values, what are the failing cases? Go forth and learn about data cleaning. Commented Jun 27, 2022 at 8:21

1 Answer 1

1

There is typo in datetimes:

1017-06-19 14:46:24

Possible solution is convert values to NaT:

df['time'] = pd.to_datetime(df['time'], errors='coerce')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.