1

Follow on question to: Python pandas for reading in file with date

I am not able to parse the date on the dataframe below. The code is as follows:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', 
                 parse_dates={'datetime': [0,1,2]}, delim_whitespace=True,
                 date_parser=lambda x: pandas.datetime.strptime(x, '%Y %m %d'))

         OTH-000.opc
              XKN1=    0.500000E-01
    Y   M   D     PRCP     VWC1    
 2006   1   1      0.0  0.17608E+00
 2006   1   2      6.0  0.21377E+00
 2006   1   3      0.1  0.22291E+00
 2006   1   4      3.0  0.23460E+00
 2006   1   5      6.7  0.26076E+00

I get an error saying: lambda () takes exactly 1 argument (3 given)

Based on @EdChum's comment below, if I use this code:

df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True))

df.index results in an object and not a datetime series

df.index
Index([u'2006 1 1',u'2006 1 2'....,u'nan nan nan'],dtype='object')

Finally the file is available here:

https://www.dropbox.com/s/0xgk2w4ed9mi4lx/test.txt?dl=0

12
  • 1
    Does this work: df = pandas.read_csv(file_name, skiprows = 2, index_col='datetime', parse_dates={'datetime': [0,1,2]}, delim_whitespace=True)) as this works for me, it seems the pandas parser is man/woman enough to handle your date format Commented Apr 22, 2015 at 18:00
  • If I do that, then df.index results in an object and not a time series. Commented Apr 22, 2015 at 18:04
  • 1
    This definitely works for me though using pandas 0.16.0 and numpy 1.9.1 python 3.4.3 64-bit, what version pandas, numpy and python are you using? Commented Apr 22, 2015 at 18:07
  • 1
    Regarding the incorrect index dtype could you try df.index = pd.to_datetime(df.index) this shouldn't be necessary but it should work for you Commented Apr 22, 2015 at 18:07
  • 1
    Glad I could help, once you rule out software versions you have to then suspect the file has something funny about it Commented Apr 22, 2015 at 18:25

1 Answer 1

1

OK I see the problem, your file had extraneous blank lines at the end, unfortunately this messes up the parser as it's looking for whitespace, this caused the df to look the following:

Out[25]:
             PRCP     VWC1
datetime                  
2006 1 1      0.0  0.17608
2006 1 2      6.0  0.21377
2006 1 3      0.1  0.22291
2006 1 4      3.0  0.23460
2006 1 5      6.7  0.26076
nan nan nan   NaN      NaN

When I remove the blank lines it imports and parses the dates fine:

Out[26]:
            PRCP     VWC1
datetime                 
2006-01-01   0.0  0.17608
2006-01-02   6.0  0.21377
2006-01-03   0.1  0.22291
2006-01-04   3.0  0.23460
2006-01-05   6.7  0.26076

and the index is now a datetimeindex as desired:

In [27]:

df.index
Out[27]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2006-01-01, ..., 2006-01-05]
Length: 5, Freq: None, Timezone: None
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.