Picking dates from an imported CSV with pandas/python

Question

I am having a .csv file with daily data, as follows:

some 19 more header rows
Werte
01.01.1971 07:00:00   ;     0.0
02.01.1971 07:00:00   ;     1.2
...and so on

which I import with:

RainD=pd.read_csv('filename.csv',skiprows=20,sep=';',dayfirst=True,parse_dates=True)

As a result, I get

In [416]: RainD
Out[416]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14976 entries, 1971-01-01 07:00:00 to 2012-01-01 07:00:00
Data columns:
Werte:    14976  non-null values
dtypes: object(1)

So its a a Dataframe, but maybe a Timeseries might be the right way? But how do I import it as such? The pandas documentation list a dtype option in read_csv, but no info on what I can/should specify.

But on the other hand, the DatetimeIndex: seems to me like pandas is quite aware of the fact that i deals with Dates here, but still makes it a Dataframe. And for that, something like RainD['1971'] just results in an u'no item named 1971' Key error.

I have the feeling that I am just missing something really obvious, since time series analysis seems to be THE thing pandas was made for.

Another first idea of mine was that pandas might get confused by the fact that the dates are written in the correct (ie dd.mm.yyyy ;) ) way, but a RainD.head() shows me that i could digest that just fine.

Regards JC

The reason that you index selection fails is because you are trying to access an index label or column with a string '1971' which will not work, if you wanted to filter the df to find index values where the year is 1971 then following would work: df[df.index.year == 1971], — EdChum
– EdChum, Commented Jan 19, 2015 at 13:29
You may be confusing the indexing semantics with a time series indexing which is entirely different — EdChum
– EdChum, Commented Jan 19, 2015 at 13:31
Yes, I am still confusing a lot of things ;) … But for now, the df[df.index.year == 1971] did reduce my confusion quite a lot! Thanks! But maybe one additional thing, before I consider this answered: What then, in this case, is the difference between a Dataframe and a Timeseries? Or asked another way: is this the correct way to do it, or rather a crude hack, that'll soon cause me to run into other issues? — JC_CL
– JC_CL, Commented Jan 19, 2015 at 13:45
You get a DataFrame because read_csv always returns a DataFrame. If you want it as a Series, you can select the one column with RainD['Werte'] (and by the way, a TimeSeries is not something special, it is just a (not used anymore) alias for a Series with a DatetimeIndex). — joris
– joris, Commented Jan 19, 2015 at 14:04

JC_CL · Accepted Answer · 2015-01-19 14:28:45Z

1

EdChum's df[df.index.year == 1971] solved my issue.

I might have some other issues (ie outdated version of pandas), but for now, I can continue working.

answered Jan 19, 2015 at 14:28

JC_CL

2,6686 gold badges28 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JC_CL Over a year ago

Will do, but I have to wait two days for that.

Collectives™ on Stack Overflow

Picking dates from an imported CSV with pandas/python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related