1

I am having a .csv file with daily data, as follows:

some 19 more header rows
Werte
01.01.1971 07:00:00   ;     0.0
02.01.1971 07:00:00   ;     1.2
...and so on

which I import with:

RainD=pd.read_csv('filename.csv',skiprows=20,sep=';',dayfirst=True,parse_dates=True)

As a result, I get

In [416]: RainD
Out[416]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14976 entries, 1971-01-01 07:00:00 to 2012-01-01 07:00:00
Data columns:
Werte:    14976  non-null values
dtypes: object(1)

So its a a Dataframe, but maybe a Timeseries might be the right way? But how do I import it as such? The pandas documentation list a dtype option in read_csv, but no info on what I can/should specify.

But on the other hand, the DatetimeIndex: seems to me like pandas is quite aware of the fact that i deals with Dates here, but still makes it a Dataframe. And for that, something like RainD['1971'] just results in an u'no item named 1971' Key error.

I have the feeling that I am just missing something really obvious, since time series analysis seems to be THE thing pandas was made for.

Another first idea of mine was that pandas might get confused by the fact that the dates are written in the correct (ie dd.mm.yyyy ;) ) way, but a RainD.head() shows me that i could digest that just fine.

Regards JC

8
  • 1
    The reason that you index selection fails is because you are trying to access an index label or column with a string '1971' which will not work, if you wanted to filter the df to find index values where the year is 1971 then following would work: df[df.index.year == 1971], Commented Jan 19, 2015 at 13:29
  • You may be confusing the indexing semantics with a time series indexing which is entirely different Commented Jan 19, 2015 at 13:31
  • Yes, I am still confusing a lot of things ;) … But for now, the df[df.index.year == 1971] did reduce my confusion quite a lot! Thanks! But maybe one additional thing, before I consider this answered: What then, in this case, is the difference between a Dataframe and a Timeseries? Or asked another way: is this the correct way to do it, or rather a crude hack, that'll soon cause me to run into other issues? Commented Jan 19, 2015 at 13:45
  • So did my comment answer your question? Commented Jan 19, 2015 at 13:46
  • You get a DataFrame because read_csv always returns a DataFrame. If you want it as a Series, you can select the one column with RainD['Werte'] (and by the way, a TimeSeries is not something special, it is just a (not used anymore) alias for a Series with a DatetimeIndex). Commented Jan 19, 2015 at 14:04

1 Answer 1

1

EdChum's df[df.index.year == 1971] solved my issue.

I might have some other issues (ie outdated version of pandas), but for now, I can continue working.

Sign up to request clarification or add additional context in comments.

1 Comment

Will do, but I have to wait two days for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.