Apply different function to pandas dataframe based on another column

Question

I have a dataframe, with two columns. I am trying to create a third column based on the numbers inside the dataframe. If the number in column b is positive, I want column C to equal column a * b

If the number in column b is negative, I want column c to equal column a * b * 0.95.

an example of what I am trying to get at:

col_a col_b col_c
100.    1.   100
100.    -1.  -95
100.    10.  1000
100.    -10.  -950


I have currently tried this:


def profit_calculation(value):

    if value<0:
        return(a * b * 0.95)
    else:
        return(a * b) 

df['col_c']=df['col_b'].apply(profit_calculation)

But this seems to be incorrect.

mcsoini · Accepted Answer · 2021-07-08 11:47:43Z

1

df = pd.DataFrame({"a": [100, 100, 100, 100],
                   "b": [1, -1, 10, -10]})

df.a * df.b * (1 - 0.05 * (df.b < 0))

# out:
0     100.0
1     -95.0
2    1000.0
3    -950.0

Explanation: When multiplied with the float 0.05 the boolean Series (df.b < 0) is cast to integers (True=1, False=0) and therefore we subtract 0.05 from 1 in all instances of negative b, hence obtaining 0.95 when we need it.

edited Jul 8, 2021 at 11:47

answered Jul 8, 2021 at 10:55

mcsoini

6,7922 gold badges21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jonas Palačionis Over a year ago

Smart way to get the answer.

sophocles · Accepted Answer · 2021-07-08 11:50:10Z

1

You can use np.where and check whether column b is greater than 0 using gt:

import numpy as np
import pandas as pd

a_b =  df.col_a.mul(df.col_b)
df['col_c'] = np.where(df['col_b'].gt(0), a_b, a_b.mul(0.95))

which prints:

>>> df

   col_a  col_b   col_c
0    100      1   100.0
1    100     -1   -95.0
2    100     10  1000.0
3    100    -10  -950.0

edited Jul 8, 2021 at 11:50

answered Jul 8, 2021 at 11:02

sophocles

13.9k3 gold badges18 silver badges37 bronze badges

Comments

Duc Cheikh · Accepted Answer · 2021-07-09 12:51:09Z

0

You can use a lambda function to create new data based on data in the dataframe(df) See explanation of lambda functions here => https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html It takes in parameter a row in the dataframe and return the update made So for each row we call profit_calculation and we give it the data corresponding to the row in parameter. So you have to replace by

def profit_calculation(value):
  return value["col_b"]*value["col_a"] if value["col_b"] > 0 else value["col_b"]*value["col_a"]*.95  

df['col_c']=df.apply(lambda value: profit_calculation(value), axis=1)

edited Jul 9, 2021 at 12:51

answered Jul 8, 2021 at 13:44

Duc Cheikh

11 bronze badge

2 Comments

Duc Cheikh Over a year ago

You can use a lambda function to create new data based on data in the dataframe(df) See explanation of lambda functions here => pandas.pydata.org/pandas-docs/stable/reference/api/… It takes in parameter a row in the dataframe and return the update made So for each row we call profit_calculation and we give it the data corresponding to the row in parameter

David Lee Over a year ago

You should edit your response instead of just putting a comment so that others don't have to look at the comments to understand your response.

Collectives™ on Stack Overflow

Apply different function to pandas dataframe based on another column

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related