0

I would like to define a function which will be applied to a dataframe whenever it will be called for a specific columns. I don't want to hard code the column names while defining the funtion. Below is my sample code. The lambda function may be complex one but I am trying with a simple one

def add(X, **args):
  for arg in args:
    X[arg].apply(lambda x: x + 10)
  return X

But if I call this function on my function like below I am getting error though I have these columns in my dataframe.

y = add(df_final['ABC', 'XYZ'])

KeyError: ('ABC', 'XYZ')

also I tried calling like below

y = add(df_final, ['ABC', 'XYZ'])

TypeError: add() takes 1 positional argument but 2 were given

It seems that I am missing some basic things here. How to modify the above code to make it working?

1
  • 1
    it would be helpful if you shared a sample input dataframe with expected output according to your function. Commented Aug 10, 2020 at 11:36

2 Answers 2

1

You can follow the **kwargs pattern of optional parameters in addition to named parameters. For purpose of demonstration if no source parameter is given use the dest as the column that is being applied to

df = pd.DataFrame({"ABC":[r for r in range(10)], "XYZ":[r for r in range(10)]})

def add(X, dest="", **kwargs):
    c = dest if "source" not in kwargs else kwargs["source"]
        
    X[dest] = X[c].apply(lambda x: x +10)
    return X
 
df = add(df, dest="ABC")
df = add(df, dest="XYZ", source="ABC")
df = add(df, dest="new", source="XYZ")
df = add(df, dest="new", source="new")
df
print(df.to_string(index=False))

output

 ABC  XYZ  new
  10   20   40
  11   21   41
  12   22   42
  13   23   43
  14   24   44
  15   25   45
  16   26   46
  17   27   47
  18   28   48
  19   29   49

Sign up to request clarification or add additional context in comments.

Comments

0

The **args definition implies a dict like object to be passed to add. You need to use *args if you want to pass an arbitrary number of value arguments after your mandatory X argument.

In your func you also need to assign the new column to the dataframe, so that it gets saved. So, given

def add(X, *args):
   for arg in args:
      X[arg] = X[arg].apply(lambda x: x + 10)
   return X

You will get the following:

>>> df
    a   b  ABC  XYZ
0   1   1    6    1
1  34  34    5    2
2  34  34    4    4
3  34  34    3    5
4   d  23    2    6
5   2   2    1    7

df = add(df, *['ABC','XYZ'])

>>> df
    a   b  ABC  XYZ
0   1   1   16   11
1  34  34   15   12
2  34  34   14   14
3  34  34   13   15
4   d  23   12   16
5   2   2   11   17

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.