pandas, apply function and lambda

Question

In the following pandas code, why is df not need in the arguments?

df.groupby('Category').apply(lambda df,a,b: sum(df[a] * df[b]), 'Weight (oz.)', 'Quantity')

The function passed to .apply get's passed either a column or a row, as a Series depending on whether you used axis=0 or axis=1, respectively. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Mar 8, 2017 at 22:24

pansen · Accepted Answer · 2017-03-08 22:31:54Z

1

The first parameter is passed implicitly to a function in the apply call. Therefore, it does not appear in the args again. You could actually rewrite the anonymous function in the apply to

 df.groupby('Category').apply(lambda x: sum(x["Weight (oz.)"] * x["Quantity"]))

without using args here at all. It get's clear, that x is the first parameter which is passed without explicitly passing it.

edited Mar 8, 2017 at 22:31

answered Mar 8, 2017 at 22:30

pansen

6,7034 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

parsethis · Accepted Answer · 2017-03-08 22:32:48Z

0

more generally, apply is an method of the DataFrame instance df.

This boils down to meaning that apply is passed a self parameter implicitly. Imagine the call to be be apply(self, *args).

Here self refers to the DataFrame instance df; so now it should be clear that passing df again would be redundant (if it were allowed).

answered Mar 8, 2017 at 22:32

parsethis

8,0883 gold badges32 silver badges32 bronze badges

Comments

miradulo · Accepted Answer · 2017-03-08 22:57:43Z

0

It is somewhat related and worth mentioning that you don't need apply at all here, and can speed up the operation considerably by only grouping the product of your two columns of interest by your 'Category' column, e.g.

(df['Weight (oz.)'] * df['Quantity']).groupby(df.Category).sum()

Example

df = pd.DataFrame(dict(category=[1, 1, 1, 2, 2, 2, 3, 3, 3]*(10**6), 
                       a = np.random.randint(1, 10, 9*(10**6)), 
                       b=np.random.randint(1, 10, 9*(10**6))))

%timeit (df.a*df.b).groupby(df.category).sum()
1 loop, best of 3: 560 ms per loop

%timeit df.groupby('category').apply(lambda x: sum(x.a*x.b))
1 loop, best of 3: 3.34 s per loop

answered Mar 8, 2017 at 22:57

miradulo

29.8k7 gold badges86 silver badges97 bronze badges

Collectives™ on Stack Overflow

pandas, apply function and lambda

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related