1

I have the following dataframe:

    2012   2013   2014   2015  2016   2017   2018                 Kategorie
0   5.31   5.27   5.61   4.34   4.54   5.02   7.07  Gewinn pro Aktie in EUR
1  13.39  14.70  12.45  16.29  15.67  14.17  10.08                      KGV
2 -21.21  -0.75   6.45 -22.63  -7.75   9.76  47.52           Gewinnwachstum
3 -17.78   2.27  -0.55   3.39   1.48   0.34    NaN                      PEG

Now, I am selecting only the KGV row with:

df[df["Kategorie"] == "KGV"]

Which outputs:

    2012  2013   2014   2015  2016   2017   2018  Kategorie
1  13.39  14.7  12.45  16.29  15.67  14.17  10.08       KGV

How do I calculate the mean() of the last five years (2016,15,14,13,12 in this example)?
I tried

df[df["Kategorie"] == "KGV"]["2016":"2012"].mean()

but this throws a TypeError. Why can I not slice the columns here?

3
  • Why are the last five years 2012-2016? Commented Sep 17, 2016 at 14:39
  • 1
    As soon as you start trying to slice with __getitem__ (square-bracket indexing), pandas looks at the rows not the columns. Also the slice only works forwards. Your indexing in this case can be done using df.loc[df["Kategorie"] == "KGV", "2012":"2016"] instead. Commented Sep 17, 2016 at 14:52
  • @AmiTavory: Last as in from now on backwards. Not last as in the last elements. Commented Sep 17, 2016 at 14:57

3 Answers 3

4

loc supports that type of slicing (from left to right):

df.loc[df["Kategorie"] == "KGV", "2012":"2016"].mean(axis=1)
Out: 
1    14.5
dtype: float64

Note that this does not necessarily mean 2012, 2013, 2014, 2015 and 2016. These are strings so it means all columns between df['2012'] and df['2016']. There could be a column named foo in between and it would be selected.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot! There's no foo column in between and the columns are sorted per year.
2

I used filter and iloc

row = df[df.Kategorie == 'KGV']

row.filter(regex='\d{4}').sort_index(1).iloc[:, -5:].mean(1)

1    13.732
dtype: float64

Comments

2

Not sure why the last five years are 2012-2016 (they seem to be the first five years). Notwithstanding, to find the mean for 2012-2016 for 'KGV', you can use

df[df['Kategorie'] == 'KGV'][[c for c in df.columns if c != 'Kategorie' and 2012 <= int(c) <= 2016]].mean(axis=1)

2 Comments

Last as in from now backwards. Why should this be used in contrast to @ayhan's approach?
@Jan No reason in particular - I answered it before his, but I like his more.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.