0

I am trying to create a function with multiple conditions. This would add new column with a calculated amount. Part of it is below.

I have looked at other answers but can't work out what is wrong.

# import modules
import pandas as pd
import numpy as np

#data
data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['no', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})

# function
def include_gst(df):
    rounded = 0.0
    if(data2['Unit of Measure'] == 'EA' and data2['round'] == 'no' and data2['add_gst'] == 'yes'):
        rounded = data['Unit Charge'] * data['Time Units']
        rounded = rounded.round(2) # to two decimal places
        rounded = rounded * 1.10 # add gst
        rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
        rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
        rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
        return  rounded
    else:
        return 1.0

data2['incl_gst'] = data2.apply(include_gst, axis = 1)

Error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

It would be good to get it to work. Thanks.

3
  • Does this answer your question? Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() Commented Dec 11, 2019 at 6:33
  • How much effort did you put into solving this? It’s an extremely common problem, there should be tons of resources available. On top of reading the Pandas docs, I would recommend this article: ericlippert.com/2014/03/05/how-to-debug-small-programs. Commented Dec 11, 2019 at 6:35
  • Thanks. I did look up the error here (the article you suggested) before I posted. I couldn't work it out. If you look at the response to the answer below, I worked out that I had data instead of df and there is no way I would have worked that out from just reading the article. I worked on it for about 20 mins before posting, cutting out parts of the function and trying the other if else methods. I don't usually program in Python and I was stuck. Good enough for you? Commented Dec 11, 2019 at 11:17

1 Answer 1

1

Here under the hood loops by apply are not necessary, you canvectorized solution - added parantheses because priority operator with & for bitwise AND to mask and new column is created by numpy.where:

#changed data for match first row
data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['yes', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units
0              EA    no     yes          1.8         1.0
1              EA   yes     yes          2.5         1.5
2              AA    no      no          3.0         2.0
  Unit of Measure round add_gst  Unit Charge  Time Units  

m = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'yes')
rounded = data2['Unit Charge'] * data2['Time Units']
rounded = rounded.round(2) # to two decimal places
rounded = rounded * 1.10 # add gst
rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05


data2['incl_gst'] = np.where(m, rounded, 1.0)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0       2.0
1              EA   yes     yes          2.5         1.5       1.0
2              AA    no      no          3.0         2.0       1.0

EDIT:

Because 12 times elif statement solution should be cahnged:

def include_gst(df):
    rounded = 0.0
    if (df['Unit of Measure'] == 'EA') and (df['round'] == 'no') and (df['add_gst'] == 'yes'):
        rounded = df['Unit Charge'] * df['Time Units']
        rounded = round(rounded, 2) # to two decimal places
        rounded = rounded * 1.10 # add gst
        rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
        rounded = round(rounded) # round to nearest five cents - step 2 - to nearest integer
        rounded = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
        return  rounded
    else:
        return 1.0

data2['incl_gst'] = data2.apply(include_gst, axis = 1)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0       2.0
1              EA   yes     yes          2.5         1.5       1.0
2              AA    no      no          3.0         2.0       1.0

Or:

data2 = pd.DataFrame({'Unit of Measure' : ['EA', 'EA', 'AA'],
                     'round' : ['no', 'yes', 'no'],
                     'add_gst' : ['yes', 'yes', 'no'],
                     'Unit Charge' : [1.8, 2.5, 3.0],
                     'Time Units' : [1.0, 1.5, 2.0]})
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units
0              EA    no     yes          1.8         1.0
1              EA   yes     yes          2.5         1.5
2              AA    no      no          3.0         2.0

m1 = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'yes')
m2 = (data2['Unit of Measure'] == 'EA') & (data2['round'] == 'yes') & (data2['add_gst'] == 'yes')

rounded = data2['Unit Charge'] * data2['Time Units']
rounded = rounded.round(2) # to two decimal places
rounded = rounded * 1.10 # add gst
rounded = rounded / 0.05 # round to nearest five cents - step 1 - divide by 0.05
rounded = rounded.round() # round to nearest five cents - step 2 - to nearest integer
rounded1 = rounded * 0.05 # round to nearest five cents - step 3 - multiply by 0.05
rounded2 = rounded * 0.08

data2['incl_gst'] = np.select([m1, m2], [rounded1, rounded2], default=1.0)
print (data2)
  Unit of Measure round add_gst  Unit Charge  Time Units  incl_gst
0              EA    no     yes          1.8         1.0      2.00
1              EA   yes     yes          2.5         1.5      6.56
2              AA    no      no          3.0         2.0      1.00
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks. So what if I want to have multiple conditions though. Would it be better in a function? For example an elif for data2['Unit of Measure'] == 'EA') & (data2['round'] == 'no') & (data2['add_gst'] == 'no
Probably 12 elif
@william3031 - added solution with apply and with numpy.select, please check edit.
So I think I just used data instead of df in the original function. Was that the only issue?
@william3031 - It is necessary because you working with each row of dataframe separately, df is used because def include_gst(df): - there is used variable df

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.