How to map a function using multiple columns in pandas?

Question

I've checked out map, apply, mapapply, and combine, but can't seem to find a simple way of doing the following:

I have a dataframe with 10 columns. I need to pass three of them into a function that takes scalars and returns a scalar ...

some_func(int a, int b, int c) returns int d

I want to apply this and create a new column in the dataframe with the result.

df['d'] = some_func(a = df['a'], b = df['b'], c = df['c'])

All the solutions that I've found seem to suggest to rewrite some_func to work with Series instead of scalars, but this is not possible as it is part of another package. How do I elegantly do the above?

It depends on what you functions are doing but typically you would do something like def func(row): return row['a'] * row['b'] * row['c'] df.apply( lambda row: func(row), axis = 1) ideally you want to write your function in a way so that it can operate on the entire series so it's vectorised, can you show what you are really trying to do — EdChum
– EdChum, Commented Feb 11, 2015 at 14:48
If for instance your function took Series as params then you could rewrite it to def some_func(a,b,c): return a*b*c df['d'] = some_func(df['a'], df['b'], df['c']) — EdChum
– EdChum, Commented Feb 11, 2015 at 14:50
"some_func" is a complex function that makes a SQL call to fill the data, so I have simplified it here. I'm using df.apply as suggested. — ashishsingal
– ashishsingal, Commented Feb 11, 2015 at 16:50
Hello @ashishsingal, if you agree that my answer is correct, please could you select it as the answer for this question? Cheers, Tomas — tsherwen
– tsherwen, Commented Nov 13, 2017 at 11:01

tsherwen · Accepted Answer · 2018-02-09 18:53:04Z

46

Use pd.DataFrame.apply(), as below:

df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']), axis=1)

NOTE: As @ashishsingal asked about columns, the axis argument should be provided with a value of 1, as the default is 0 (as in the documentation and copied below).

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

0 or ‘index’: apply function to each column

or ‘columns’: apply function to each row

edited Feb 9, 2018 at 18:53

answered Jun 20, 2017 at 8:44

tsherwen

1,14616 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Toby Petty · Accepted Answer · 2020-09-10 15:04:38Z

22

For what it's worth on such an old question; I find that zipping function arguments into tuples and then applying the function as a list comprehension is much faster than using df.apply. For example:

import pandas as pd

# Setup:
df = pd.DataFrame(np.random.rand(10000, 3), columns=list("abc"))
def some_func(a, b, c):
    return a*b*c

# Using apply:
%timeit df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']), axis=1)

222 ms ± 63.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Using tuples + list comprehension:
%timeit df["d"] = [some_func(*a) for a in tuple(zip(df["a"], df["b"], df["c"]))]

8.07 ms ± 640 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Sep 10, 2020 at 15:04

Toby Petty

4,6901 gold badge19 silver badges31 bronze badges

3 Comments

ML_Passion Over a year ago

Hi @toby-petty, can this method be used when the function is returning 2 values, which can be assigned to new columns of the dataframe. df[['c','d']] = [some_func(*a) for a in tuple(zip(df["a"], df["b"], df["c"]))]

Toby Petty Over a year ago

Hi @ML_Passion, yes it will work exactly as you put it, so long as you change some_func to return 2 values instead of 1. Actually the apply method wouldn't work for the use case of adding multiple columns at once, it would need to be applied multiple times, making it even slower, so that's another win for this method.

ML_Passion Over a year ago

Thanks a ton @toby-petty, i have a use case for this right now in my work. I want to wrap in a parallelize function using multiprocessing. Having some challenges but should be able to solve it.

Andrea Dalseno · Accepted Answer · 2021-11-03 00:39:32Z

8

I use map that is as fast as list comprehension (much faster than apply):

df['d'] = list(map(some_func, df['a'], df['b'], df['c']))

Example on my machine:

import pandas as pd

# Setup:
df = pd.DataFrame(np.random.rand(10000, 3), columns=list("abc"))
def some_func(a, b, c):
    return a*b*c

# Using apply:
%timeit df['d'] = df.apply(lambda x: some_func(a = x['a'], 
b = x['b'], c = x['c']), axis=1)

130 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['d'] = list(map(some_func, df['a'], df['b'], df['c']))

3.91 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Nov 3, 2021 at 0:39

Andrea Dalseno

1902 silver badges8 bronze badges

1 Comment

CodeCabbie Over a year ago

Surely this is the difference between element and vector processing? Perfectly accurate if 'some_func' supports vector processing. However, the OP said 'some_func' was complex (including SQL calls) and did NOT support vector processing. Am I missing something here?

ashishsingal · Accepted Answer · 2015-02-11 15:46:03Z

4

I'm using the following:

df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']))

Seems to be working well, but if anyone else has a better solution, please let me know.

answered Feb 11, 2015 at 15:46

ashishsingal

2,9883 gold badges20 silver badges27 bronze badges

Comments

Max O · Accepted Answer · 2022-07-22 22:15:13Z

2

Very nice tip to use a list comprehension like Toby Petty recommended

df["d"] = [some_func(*a) for a in tuple(zip(df["a"], df["b"], df["c"]))]

This can be further optimized by removing the tuple instantiation

df["d"] = [some_func(*a) for a in zip(df["a"], df["b"], df["c"])]

A even faster way to map multiple columnns is to use frompyfunc from numpy to create a vectorized version of the python function

import numpy as np
    
some_func_vec = np.frompyfunc(some_func, 3, 1)
df["d"] = some_func_vec(df["a"], df["b"], df["c"])

answered Jul 22, 2022 at 22:15

Max O

211 bronze badge

Comments

Elias Hasle · Accepted Answer · 2018-11-22 12:46:54Z

0

If it is a really simple function, such as one based on simple arithmetic, chances are it can be vectorized. For instance, a linear combination can be made directly from the columns:

df["d"] = w1*df["a"] + w2*df["b"] + w3*["c"]

where w1,w2,w3 are scalar weights.

answered Nov 22, 2018 at 12:46

Elias Hasle

6677 silver badges17 bronze badges

Comments

Albert · Accepted Answer · 2023-12-18 21:57:40Z

0

You can also

df['d'] = df.agg(lambda row : some_function(row.a, row.b, row.c), axis=1)

I think it is much faster than df.apply.

edited Dec 18, 2023 at 21:57

answered Dec 18, 2023 at 21:36

Albert

859 bronze badges

Collectives™ on Stack Overflow

How to map a function using multiple columns in pandas?

7 Answers 7

Comments

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

3 Comments

1 Comment

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related