Pandas dataframe slicing

Question

I have the following dataframe:

    2012   2013   2014   2015  2016   2017   2018                 Kategorie
0   5.31   5.27   5.61   4.34   4.54   5.02   7.07  Gewinn pro Aktie in EUR
1  13.39  14.70  12.45  16.29  15.67  14.17  10.08                      KGV
2 -21.21  -0.75   6.45 -22.63  -7.75   9.76  47.52           Gewinnwachstum
3 -17.78   2.27  -0.55   3.39   1.48   0.34    NaN                      PEG

Now, I am selecting only the KGV row with:

df[df["Kategorie"] == "KGV"]

Which outputs:

    2012  2013   2014   2015  2016   2017   2018  Kategorie
1  13.39  14.7  12.45  16.29  15.67  14.17  10.08       KGV

How do I calculate the mean() of the last five years (2016,15,14,13,12 in this example)?
I tried

df[df["Kategorie"] == "KGV"]["2016":"2012"].mean()

but this throws a TypeError. Why can I not slice the columns here?

As soon as you start trying to slice with __getitem__ (square-bracket indexing), pandas looks at the rows not the columns. Also the slice only works forwards. Your indexing in this case can be done using df.loc[df["Kategorie"] == "KGV", "2012":"2016"] instead. — Alex Riley
– Alex Riley, Commented Sep 17, 2016 at 14:52
@AmiTavory: Last as in from now on backwards. Not last as in the last elements. — Jan
– Jan, Commented Sep 17, 2016 at 14:57

user2285236 · Accepted Answer · 2016-09-17 14:50:39Z

4

loc supports that type of slicing (from left to right):

df.loc[df["Kategorie"] == "KGV", "2012":"2016"].mean(axis=1)
Out: 
1    14.5
dtype: float64

Note that this does not necessarily mean 2012, 2013, 2014, 2015 and 2016. These are strings so it means all columns between df['2012'] and df['2016']. There could be a column named foo in between and it would be selected.

answered Sep 17, 2016 at 14:50

user2285236

Sign up to request clarification or add additional context in comments.

1 Comment

Jan Over a year ago

Thanks a lot! There's no foo column in between and the columns are sorted per year.

piRSquared · Accepted Answer · 2016-09-17 14:43:53Z

2

I used filter and iloc

row = df[df.Kategorie == 'KGV']

row.filter(regex='\d{4}').sort_index(1).iloc[:, -5:].mean(1)

1    13.732
dtype: float64

answered Sep 17, 2016 at 14:43

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Comments

Ami Tavory · Accepted Answer · 2016-09-17 14:47:49Z

2

Not sure why the last five years are 2012-2016 (they seem to be the first five years). Notwithstanding, to find the mean for 2012-2016 for 'KGV', you can use

df[df['Kategorie'] == 'KGV'][[c for c in df.columns if c != 'Kategorie' and 2012 <= int(c) <= 2016]].mean(axis=1)

edited Sep 17, 2016 at 14:47

answered Sep 17, 2016 at 14:42

Ami Tavory

76.7k13 gold badges152 silver badges196 bronze badges

2 Comments

Jan Over a year ago

Last as in from now backwards. Why should this be used in contrast to @ayhan's approach?

Ami Tavory Over a year ago

@Jan No reason in particular - I answered it before his, but I like his more.

Collectives™ on Stack Overflow

Pandas dataframe slicing

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related