How to pass multiple arguments from a pandas dataframe to a function and return the result to the datframe at specific locations in the dataframe

Question

Lets say I have a the following pandas data frame with the following columnar structure and the dataframe is titled df

index column1 column2 column3
0     2       5       apple
1     4       3       apple
2     6       1       orange 
3     8       6       apple 
4    10       5       orange

I would like to search the dataframe such that it will recognize every row where df['column3'] == orange and extract the value of df['column1'] and df['column2'] in that row and insert it into the below function and then change the existing value of df[column2'] by the output of the function.

def func(x, y):
    return x * 2.0

Thus far I have implemented the following, which works, but I suspect it is not the most pythonic way of doing this, and probably does not have the most efficient execution speed. Any advice would be appreciated.

for i in range(len(df.index)):
    if df.loc[i, 'column3'] == 'orange':
        df.loc[i, 'column2'] = func(df.column1, df.column2)

rafaelc · Accepted Answer · 2018-06-18 00:04:36Z

2

There is no need to use apply.

You can simply use loc and a mask.

mask = df['column3'] == "orange"
df.loc[mask, "column2"] = func(df.loc[mask].column1, df.loc[mask].column2)

This is simpler and faster than apply.

answered Jun 18, 2018 at 0:04

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

jpp Over a year ago

Do you have a reference for "faster"? Seems also your func will have to be changed to work with series. Makes it less simple.

rafaelc Over a year ago

@jpp Where did you get this requirement from? Function works perfectly with series as is. There is no need for a generic function here. Should OP need more complex logic, then I'd agree with you. But this is not the case.

jpp Over a year ago

The underlying assumption is that all the operations you would want to do to in func can be operated on a series as well as a scalar. For x * 2.0, this is true. But it isn't generally the case. I'm assuming that OP's real function is not x * 2.0 as this is trivially vectorisable (in this case, a custom function is not necessary and should be discouraged).

rafaelc Over a year ago

@jpp I am working with the information given. In my opinion it is a non-sense downvote to assume OP need something different :)

3kt · Accepted Answer · 2018-06-17 22:01:43Z

0

Nest your condition in an apply:

In [26]: df
Out[26]:
       column1  column2 column3
index
0            2        5   apple
1            4        3   apple
2            6        1  orange
3            8        6   apple
4           10        5  orange

In [27]: df['column2'] = df.apply(lambda x: func(x['column1'], x['column2']) \
if x['column3'] == 'orange' else x['column2'], axis=1)

In [28]: df
Out[28]:
       column1  column2 column3
index
0            2      5.0   apple
1            4      3.0   apple
2            6     12.0  orange
3            8      6.0   apple
4           10     20.0  orange

answered Jun 17, 2018 at 22:01

3kt

2,5432 gold badges19 silver badges30 bronze badges

2 Comments

Jon Over a year ago

Thank you, that worked very well for my problem and avoided having to re-write the function.

jpp Over a year ago

I have to downvote this as ternary statements in pd.DataFrame.apply are really not Pythonic.

jpp · Accepted Answer · 2018-06-17 21:45:23Z

0

Using pd.DataFrame.apply, you can define a function which is applied to each row sequentially. Note that the row is passed to your function as a series object and may be unpacked into component fields via the syntax row['col_name'].

As this method is just a thinly veiled loop, you are advised, where possible, to use a vectorised solution where possible.

def func(row):
    x = row['column1']
    y = row['column2']
    if row['column3'] == 'orange':
        return x * 2.0
    else:
        return y

df['column2'] = df.apply(func, axis=1)

print(df)

   index  column1  column2 column3
0      0        2      5.0   apple
1      1        4      3.0   apple
2      2        6     12.0  orange
3      3        8      6.0   apple
4      4       10     20.0  orange

answered Jun 17, 2018 at 21:45

jpp

166k37 gold badges301 silver badges362 bronze badges

1 Comment

Jon Over a year ago

Unfortunately this solution requires that I change the nature of the function, which I can't do because it is used elsewhere in the program. It is beginning to seem like a for loop is the only way that I can do this.

Collectives™ on Stack Overflow

How to pass multiple arguments from a pandas dataframe to a function and return the result to the datframe at specific locations in the dataframe

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related