I am having a .csv file with daily data, as follows:
some 19 more header rows
Werte
01.01.1971 07:00:00 ; 0.0
02.01.1971 07:00:00 ; 1.2
...and so on
which I import with:
RainD=pd.read_csv('filename.csv',skiprows=20,sep=';',dayfirst=True,parse_dates=True)
As a result, I get
In [416]: RainD
Out[416]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14976 entries, 1971-01-01 07:00:00 to 2012-01-01 07:00:00
Data columns:
Werte: 14976 non-null values
dtypes: object(1)
So its a a Dataframe, but maybe a Timeseries might be the right way? But how do I import it as such? The pandas documentation list a dtype option in read_csv, but no info on what I can/should specify.
But on the other hand, the DatetimeIndex: seems to me like pandas is quite aware of the fact that i deals with Dates here, but still makes it a Dataframe. And for that, something like RainD['1971'] just results in an u'no item named 1971' Key error.
I have the feeling that I am just missing something really obvious, since time series analysis seems to be THE thing pandas was made for.
Another first idea of mine was that pandas might get confused by the fact that the dates are written in the correct (ie dd.mm.yyyy ;) ) way, but a RainD.head() shows me that i could digest that just fine.
Regards JC
'1971'which will not work, if you wanted to filter the df to find index values where the year is1971then following would work:df[df.index.year == 1971],df[df.index.year == 1971]did reduce my confusion quite a lot! Thanks! But maybe one additional thing, before I consider this answered: What then, in this case, is the difference between a Dataframe and a Timeseries? Or asked another way: is this the correct way to do it, or rather a crude hack, that'll soon cause me to run into other issues?read_csvalways returns a DataFrame. If you want it as a Series, you can select the one column withRainD['Werte'](and by the way, a TimeSeries is not something special, it is just a (not used anymore) alias for a Series with a DatetimeIndex).