1

I try to learn more about the apply method in python and asking myself how to write the following code using apply:

I have a dataframe df like the following:

  A B C D E points
0 0 0 0 1 43 94
1 0 0 1 1 55 62
2 1 1 0 1 21 84
3 1 0 1 0 13 20

Furthermore I have a function like the following, which does its job:

def f1(df):
  df_means = pd.DataFrame(columns = ['Mean_Points'])
  for columnname in df.columns:
    if len(df[df[columnname] == 1]) > 1:
      df_means.loc[columnname] = [df[df[columnname] == 1]['points'].mean()]
  return df_means

So the output of f1 is

  'Mean_Points'
A      52
C      41
D      80

and that's totally fine. But I am wondering if there is a possibility (I am sure there is) to obtain the same result with the apply method. I tried:

df_means = pd.DataFrame(columns = ['Mean_Points'])
cols = [col for col in df.columns if len(df[df[col] == 1]) > 1]
df_means.loc[cols] = df[cols].apply(lambda x: df[df[x] == 1]['points'].mean(), axis = 1)

or similar:

df_means = pd.DataFrame(columns = ['Mean_Points'])
df.columns.apply(lambda x: df_means.loc[x] = [df[df[x] == 1]['points'].mean()] if len(df[df[x] == 1]) > 1 else None)

and 2,3 other things, but nothing worked... I hope somebody can help me here?!

3
  • Explain the logic behind your function. Commented Feb 13, 2019 at 15:27
  • What is going on with column E? Commented Feb 13, 2019 at 15:27
  • 1
    The function checks for every column in df, if a '1' appears multiple times in that column. If that is the case it creates a new row in df_means with the columnname as index and the mean value of df['points'] where the '1's appear in the column of df as column. Commented Feb 13, 2019 at 15:36

3 Answers 3

3

pd.DataFrame.dot

#                      filters s to be just those
#                      things greater than 1
#                      v
s = df.eq(1).sum().loc[lambda x: x > 1]
df.loc[:, s.index].T.dot(df.points).div(s)

A    52.0
C    41.0
D    80.0
dtype: float64

One liner approach

This removes the chaff but probably does more calculations than necessary.

df.T.dot(df.points).div(df.sum())[df.eq(1).sum().gt(1)]

A    52.0
C    41.0
D    80.0
dtype: float64
Sign up to request clarification or add additional context in comments.

3 Comments

Howdy pi! Always nice to see your analytical mathematical approaches. +1
o/ @ScottBoston (-:
Yes, this is cleaner than mine with all the garbage multiplications you then drop :D
3

In general, you should try to see if you can avoid using .apply(axis=1).

In this case, you can get by with DataFrame.mulitply(), replacing 0 with np.NaN so it doesn't count toward the average.

import numpy as np

s = df.replace(0, np.NaN).multiply(df.points, axis=0).mean()
#A           52.0
#B           84.0
#C           41.0
#D           80.0
#E         2369.0
#points    5034.0
#dtype: float64

Now we'll add your condition to only consider columns with multiple instances of 1, and subset to those with .reindex

m = df.eq(1).sum().gt(1)
s = s.reindex(m[m].index)

Output s:

A      52.0
C      41.0
D      80.0
dtype: float64

Comments

0

Here is another way to do it, not purely pandas as others have shown.

cols = ['A', 'B', 'C', 'D']

def consolidate(series):
    cond = series > 0
    points = df.loc[cond, 'points']
    if len(points) > 1:
        return series.name, points.mean()
    else:
        return series.name, np.nan

df1 = pd.DataFrame([consolidate(df[col]) for col in cols], columns=['name', 'mean_points'])


print(df1)


  name  mean_points
0    A         52.0
1    B          NaN
2    C         41.0
3    D         80.0

If no NaN needed then

df1.dropna()

  name  mean_points
0    A         52.0
2    C         41.0
3    D         80.0

And using apply

df[cols].apply(consolidate,result_type='expand')
        .T.dropna()
        .reset_index()
        .drop('index', axis=1)

0  A  52
1  C  41
2  D  80

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.