Python failed to parse dates when reading from csv file [duplicate]

Question

I've been using

pd.read_csv('file.csv',parse_dates=['date_time'])

to parse dates and then run DateTimeIndex to read year, month, day from date_time variable. When done correctly, 'date_time' should be formatted as datetime64. But something is in the data column that I keep getting 'object' as variable format so I receive ValueError when DateTimeIndex it. My data is too big for me to find out what exactly happened. How should I handle this so I can perhaps change the anomalies to missing and get the data_time variable parsed? Thanks.

Update:

I did what Edchum suggested except pretty manually. Here is my guess why the data is out of sort: one date was supposed to be 2016 instead it shows 2161. does anyone know why python wouldn't parse date time in this case? How can I identify all rows that are similar to this and delete all?

Add sample data, in case there is something unique about it. Check syntax for parse_dates. — Merlin
– Merlin, Commented Jun 7, 2016 at 20:03
I tried the method suggested in the thread but got below error: — CWeeks
– CWeeks, Commented Jun 7, 2016 at 20:04
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-255-255 00:00:00 — CWeeks
– CWeeks, Commented Jun 7, 2016 at 20:04
You can use the binary search approach to find the problematic record (take the 1st/2nd half, see which has problem, then split further). — ivan_pozdeev
– ivan_pozdeev, Commented Jun 7, 2016 at 20:04

Merlin · Accepted Answer · 2016-06-07 23:23:28Z

2

Try this:

import pandas as pd 
df = pd.read_csv('test.csv.gz', compression='infer',date_parser=True, usecols=([0,1,3]))
print df.head()

#       id            date_time  posa_continent
#    0   0  2015-09-03 17:09:54               3
#    1   1  2015-09-24 17:38:35               3
#    2   2  2015-06-07 15:53:02               3
#    3   3  2015-09-14 14:49:10               3
#    4   4  2015-07-17 09:32:04               3

answered Jun 7, 2016 at 23:23

Merlin

25.9k44 gold badges141 silver badges213 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

CWeeks Over a year ago

This is a neat way to load data directly from gz files. it still doesn't quite solve my problem. apology I cited date_time in the question (to stay general) but it is actually srch_ci column (search check in date) that had the issue. I played around with the data more and confirmed that it is that one particular row (row id=312920) causing all the trouble. So I had it deleted and everything went fine. But thanks for the help!!!

Merlin Over a year ago

@CWlearner, if it one row edit that row and add back in.

CWeeks Over a year ago

Yeah, I think you are right. I am going to change its value to missing so it can be treated later on.

michael_j_ward · Accepted Answer · 2016-06-08 00:30:47Z

1

This will help you diagnose the problem. Please run this snippet and post the output of bad_rows

df = pd.read_csv('file.csv')
bad rows = []
good_rows = []
for row, date in enumerate(df['date_time']):
    try:
        good_rows.append((row,dateutil.parser.parse(date)))
    except Exception as e:
        print(str(e))
        bad_rows.append((row,date))

edited Jun 8, 2016 at 0:30

answered Jun 7, 2016 at 20:49

michael_j_ward

4,6891 gold badge27 silver badges25 bronze badges

5 Comments

CWeeks Over a year ago

I ran the code but bad_rows seems to capture all the rows vs. good_rows captures none..

michael_j_ward Over a year ago

edited. Please give at least a few examples of the date that could not be parsed and the corresponding Exception

CWeeks Over a year ago

@michael_j_ward, thanks for following up. I ran it again but got error again. It says "ValueError: I/O operation on closed file"

CWeeks Over a year ago

@Merlin it is the test file from kaggle expedia contest. here is the link to data: kaggle.com/c/expedia-hotel-recommendations/data

CWeeks Over a year ago

I will also try to add a few lines to show as example

Collectives™ on Stack Overflow

Python failed to parse dates when reading from csv file [duplicate]

2 Answers 2

3 Comments

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Linked

Related