Apply function with two arguments to columns

Question

Can you make a pandas function with values in two different columns as arguments?

I have a function that returns a 1 if two columns have values in the same range. otherwise it returns 0:

def segmentMatch(RealTime, ResponseTime):
    if RealTime <= 566 and ResponseTime <= 566:
        matchVar = 1
    elif 566 < RealTime <= 1132 and 566 < ResponseTime <= 1132:
        matchVar = 1
    elif 1132 < RealTime <= 1698 and 1132 < ResponseTime <= 1698:
        matchVar = 1
    else:
        matchVar = 0
    return matchVar

I want the first argument, RealTime, to be a column in my data frame, such that the function will take the value of each row in that column. e.g. RealTime is df['TimeCol'] and the second argument is df['ResponseCol']. And I'd like the result to be a new column in the dataframe. I came across several threads that have answered a similar question, but it looks like those arguments were variables, not values in rows of the dataframe.

I tried the following but it didn't work:

df['NewCol'] = df.apply(segmentMatch, args=(df['TimeCol'], df['ResponseCol']), axis=1)

Nelewout · Accepted Answer · 2022-07-19 08:35:46Z

121

Why not just do this?

df['NewCol'] = df.apply(lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']), 
                        axis=1)

Rather than trying to pass the column as an argument as in your example, we now simply pass the appropriate entries in each row as argument, and store the result in 'NewCol'.

edited Jul 19, 2022 at 8:35

answered Dec 15, 2015 at 1:25

Nelewout

6,6446 gold badges31 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Zach Over a year ago

Thank you! I can even use this with arguments! Tried doing this without a lambda function and couldn't figure out a way around that :)

mmTmmR Over a year ago

@N.Wouda Could you please explain what is going on in your answer above? What is the value of the lambda expression argument x? It looks like it would be my dataframe name df, however I never had to define it as such so I'm a little confused. Thanks

Nelewout Over a year ago

@mmTmmR yes, df would be your DataFrame. The value of x is a pandas row, as per the documentation. The use of df is more of a convention, as any other name would also do. The same holds for x.

k0rnik Over a year ago

4 hours of searching internet, and I almost created new post. This is an excellent solution, helps to avoid error when passing multiple arguments and when using boolean operators with if statement

anon Over a year ago

... axis=1 ... i slammed my head on my desk for 45 minutes until i saw that! thanks!

|

rahul · Accepted Answer · 2018-12-04 04:40:01Z

24

You don't really need a lambda function if you are defining the function outside:

def segmentMatch(vec):
    RealTime = vec[0]
    ResponseTime = vec[1]
    if RealTime <= 566 and ResponseTime <= 566:
        matchVar = 1
    elif 566 < RealTime <= 1132 and 566 < ResponseTime <= 1132:
        matchVar = 1
    elif 1132 < RealTime <= 1698 and 1132 < ResponseTime <= 1698:
        matchVar = 1
    else:
        matchVar = 0
    return matchVar

df['NewCol'] = df[['TimeCol', 'ResponseCol']].apply(segmentMatch, axis=1)

If "segmentMatch" were to return a vector of 2 values instead, you could do the following:

def segmentMatch(vec):
    ......
    return pd.Series((matchVar1, matchVar2)) 

df[['NewCol', 'NewCol2']] = df[['TimeCol','ResponseCol']].apply(segmentMatch, axis=1)

edited Dec 4, 2018 at 4:40

answered Oct 11, 2018 at 19:54

rahul

3512 silver badges8 bronze badges

Comments

Artem Sokolov · Accepted Answer · 2020-10-23 14:49:30Z

5

A chain-friendly way to perform this operation is via assign():

df.assign( NewCol = lambda x: segmentMatch(x['TimeCol'], x['ResponseCol']) )

answered Oct 23, 2020 at 14:49

Artem Sokolov

13.8k4 gold badges49 silver badges78 bronze badges

Comments

rdmtinez · Accepted Answer · 2022-09-15 20:52:05Z

At my current workplace the use of lambda functions is frowned upon, and perhaps you've encountered the same issue at your workplaces. So I came up with this which should work for any number of columns as input or output so long as your own function's logic is sound.

import functools # not required, but helps in production
def unpack_df_columns(func):
    """
    A general use decorator to unpack a df[subset] of columns
    into a function which expects the values at those columns
    as arguments
    """
    
    @functools.wraps(func)
    def _unpack_df_columns(*args, **kwargs):
        
        # args[0] is a pandas series equal in length as the 
        # df[subset] to which the apply function is applied 
        series = args[0]

        # series.values holds the number of arguments expected
        # by func and is os length len(df[subset].columns)
        return func(*series.values)

    return _unpack_df_columns

@unpack_df_columns
def two_arg_func(a, b):
    return pd.Series((a+b, a*b))

@unpack_df_columns
def three_arg_func(x, y, z):
    return x+y+z

df["x_y_z_sum"] = df[['x', 'y', 'z']].apply(three_arg_func, axis=1)

df[["a_b_sum", "a_b_prod"]] = df[['a', 'b']].apply(two_arg_func, axis=1)

Collectives™ on Stack Overflow

Apply function with two arguments to columns

4 Answers 4

6 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related