Date parse error in Python pandas while reading file

Question

Follow on question to: Python pandas for reading in file with date

I am not able to parse the date on the dataframe below. The code is as follows:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', 
                 parse_dates={'datetime': [0,1,2]}, delim_whitespace=True,
                 date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))

         OTH-000.opc
              XKN1=    0.500000E-01
    Y   M   D     PRCP     VWC1    
 2006   1   1      0.0  0.17608E+00
 2006   1   2      6.0  0.21377E+00
 2006   1   3      0.1  0.22291E+00
 2006   1   4      3.0  0.23460E+00
 2006   1   5      6.7  0.26076E+00

I get an error saying: lambda () takes exactly 1 argument (3 given)

Based on @EdChum's comment below, if I use this code:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True))

df.index results in an object and not a datetime series

df.index
Index([u'2006 1 1',u'2006 1 2'....,u'nan nan nan'],dtype='object')

Finally the file is available here:

https://www.dropbox.com/s/0xgk2w4ed9mi4lx/test.txt?dl=0

Does this work: df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True)) as this works for me, it seems the pandas parser is man/woman enough to handle your date format — EdChum
– EdChum, Commented Apr 22, 2015 at 18:00
If I do that, then df.index results in an object and not a time series. — user308827
– user308827, Commented Apr 22, 2015 at 18:04
This definitely works for me though using pandas 0.16.0 and numpy 1.9.1 python 3.4.3 64-bit, what version pandas, numpy and python are you using? — EdChum
– EdChum, Commented Apr 22, 2015 at 18:07
Regarding the incorrect index dtype could you try df.index = pd.to_datetime(df.index) this shouldn't be necessary but it should work for you — EdChum
– EdChum, Commented Apr 22, 2015 at 18:07
Glad I could help, once you rule out software versions you have to then suspect the file has something funny about it — EdChum
– EdChum, Commented Apr 22, 2015 at 18:25

EdChum · Accepted Answer · 2015-04-22 18:24:09Z

OK I see the problem, your file had extraneous blank lines at the end, unfortunately this messes up the parser as it's looking for whitespace, this caused the df to look the following:

Out[25]:
             PRCP     VWC1
datetime                  
2006 1 1      0.0  0.17608
2006 1 2      6.0  0.21377
2006 1 3      0.1  0.22291
2006 1 4      3.0  0.23460
2006 1 5      6.7  0.26076
nan nan nan   NaN      NaN

When I remove the blank lines it imports and parses the dates fine:

Out[26]:
            PRCP     VWC1
datetime                 
2006-01-01   0.0  0.17608
2006-01-02   6.0  0.21377
2006-01-03   0.1  0.22291
2006-01-04   3.0  0.23460
2006-01-05   6.7  0.26076

and the index is now a datetimeindex as desired:

In [27]:

df.index
Out[27]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None

Collectives™ on Stack Overflow

Date parse error in Python pandas while reading file

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related