2

I have an example dataframe with columns 'one' and 'two' consisting of some random ints. I was trying to understand some code with a lambda function in more depth and was puzzled that the code seems to magically work without providing an argument to be passed to the lambda function.

Initially I am creating a new column 'newcol' with pandas assign() method and pass df into an explicit lambda function func(df). The function returns the logs of the df's 'one' column:

df=df.assign(newcol=func(df))

So far so good.

However, what puzzles me is that the code works as well without passing df.

df=df.assign(newcol2=func)

Even if I don't pass (df) into the lambda function, it correctly performs the operation. How does the interpreter know that df is being passed into the lambda function?

Example code below and output:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda x: np.log(x.one)
df=df.assign(newcol=func(df))
print(df)

#This one works too, but why?
df=df.assign(newcol2=func)
print(df)
Output:
   one  two    newcol   newcol2
0    1    8  0.000000  0.000000
1    6    7  1.791759  1.791759
2    2    6  0.693147  0.693147
3    2    8  0.693147  0.693147
4    4    2  1.386294  1.386294
5    9    3  2.197225  2.197225
6    2    2  0.693147  0.693147
7    4    7  1.386294  1.386294

(Note I could have used the lambda func inline of assign but have it here explicit for the sake of clarity.)

5
  • I don't know much about df, but this code: df.assign(newcol=func(df)) means you have already called func with param df. However this: df.assign(newcol2=func) means you are passing func in without calling it, so maybe df can call it when it wants to. Commented Oct 15, 2019 at 9:57
  • As @quamrana says. In the documentation it says "If the values are callable, they are computed on the DataFrame and assigned to the new columns.... If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned." so in the second example, it's applying the function. Commented Oct 15, 2019 at 10:01
  • cheers, great answers. Learned something new today :) Commented Oct 15, 2019 at 10:10
  • As a side note, Python code is interpreted, not compiled (typically). Commented Oct 15, 2019 at 10:14
  • thanks, edited into interpreter. Commented Oct 15, 2019 at 10:18

2 Answers 2

1

If you use pd.DataFrame.assign() and pass on a callable, it assumes that the first argument is actually the dataframe itself.

For example, if you change your code to the following:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda c, x: np.log(x.one + c)
df=df.assign(newcol=func(1, df))
print(df)

#This one will no longer work!
df=df.assign(newcol2=func)
print(df)

the last call to assign() will not work.

This is explained in the official documentation. The line df.assign(newcol=func(1, df)) uses the non-callable pathway, while the line df.assign(newcol=func) uses the callable pathway.

Sign up to request clarification or add additional context in comments.

Comments

0

It's not compilation, it's simply how assign source code is written. As mentioned in pandas assign documentation.

Where the value is a callable, evaluated on df:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.