Making a new variable numerical column using if/else statements

Question

I have a dataset that contains a column 'y' in which a particular values exist. I would like to take that column and make a new column (z) denoting if y value is 47472 then z should be 1000, if y <1000 then z=y*2, else all other values should be 2000. Here's a mock example of the data. I don't have a 'z' column, but I want to create it:

          y      z
0      1751   2000
1       800   1600
2     10000   2000
3       350    700
4       750   1500
5      1750   3500
6     30000   2000
7     47472   1000


def test(y):
    if y == 47472:
        z=1000
    elif y < 1000:
        z=y*2
    else:
        z=2000
    return Z

# I tried to call the above function below
z = test(y)
z

but I don't get the result instead it shows below error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

That's a NumPy error; you should at least have that as a tag. And you need to show what y is here. From the error, it appears to be a Series, not an integer. — Daniel Roseman
– Daniel Roseman, Commented Nov 23, 2015 at 9:20
y is just a random variable. it is a subset of a main data frame. i just took y variable from another file and trying to create Z. it looks to be an array. Please let me know if you have any solution — Sanchit Aluna
– Sanchit Aluna, Commented Nov 23, 2015 at 10:10

agold · Accepted Answer · 2015-11-23 12:53:55Z

1

The problem is that you are using a Series in the if statement, such as:

if y == 47472:

assuming that y is part of your DataFrame this will result in a list of booleans:

>>> df['y']==47472
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
Name: y, dtype: bool

Which is not legal, and therefore it suggests you to use a boolean function that returns one boolean such as any(), all(), etc. Instead you should use boolean indexing:

# df is the dataframe with your data
# adding column z
df['z'] = pd.Series(np.zeros(df.shape[0]))
# if y == 47472 then put 1000
df.loc[df['y']==47472, 'z'] = 1000
# filter <1000
df.loc[df['y']<1000, 'z'] = 2*df['y']
# now set rest to 2000 (i.e. ones that do not comply previous 2 conditions)
df.loc[(df['y']>=1000) & (df['y']!=47472),'z'] = 2000

Edit: As commented by EdChum I was performing chained indexing:

df['z'][df['y']<1000] = 2*df['y']

which should be avoided by using loc:

df.loc[df['y']<1000, 'z'] = 2*df['y']

edited Nov 23, 2015 at 12:53

answered Nov 23, 2015 at 11:02

agold

6,3069 gold badges41 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EdChum Over a year ago

You're performing chained indexing, please change to df.loc[df['y']==47472, 'z'] = 1000, df.loc[df['y']<1000, 'z'] = 2*df['y'] and df.loc[(df['y']>=1000) & (df['y']!=47472), 'z'] = 2000 respectively

agold Over a year ago

@EdChum thanks for this information, I changed the code.

Sanchit Aluna Over a year ago

Thanks all for your help i have created new column. Thanks again

Collectives™ on Stack Overflow

Making a new variable numerical column using if/else statements

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related