I have an example dataframe with columns 'one' and 'two' consisting of some random ints. I was trying to understand some code with a lambda function in more depth and was puzzled that the code seems to magically work without providing an argument to be passed to the lambda function.
Initially I am creating a new column 'newcol' with pandas assign() method and pass df into an explicit lambda function func(df). The function returns the logs of the df's 'one' column:
df=df.assign(newcol=func(df))
So far so good.
However, what puzzles me is that the code works as well without passing df.
df=df.assign(newcol2=func)
Even if I don't pass (df) into the lambda function, it correctly performs the operation. How does the interpreter know that df is being passed into the lambda function?
Example code below and output:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,10,size=16).reshape(8,2),columns=["one","two"])
func=lambda x: np.log(x.one)
df=df.assign(newcol=func(df))
print(df)
#This one works too, but why?
df=df.assign(newcol2=func)
print(df)
Output:
one two newcol newcol2
0 1 8 0.000000 0.000000
1 6 7 1.791759 1.791759
2 2 6 0.693147 0.693147
3 2 8 0.693147 0.693147
4 4 2 1.386294 1.386294
5 9 3 2.197225 2.197225
6 2 2 0.693147 0.693147
7 4 7 1.386294 1.386294
(Note I could have used the lambda func inline of assign but have it here explicit for the sake of clarity.)
df.assign(newcol=func(df))means you have already calledfuncwith paramdf. However this:df.assign(newcol2=func)means you are passingfuncin without calling it, so maybe df can call it when it wants to.