2

I have a dataset that contains a column 'y' in which a particular values exist. I would like to take that column and make a new column (z) denoting if y value is 47472 then z should be 1000, if y <1000 then z=y*2, else all other values should be 2000. Here's a mock example of the data. I don't have a 'z' column, but I want to create it:

          y      z
0      1751   2000
1       800   1600
2     10000   2000
3       350    700
4       750   1500
5      1750   3500
6     30000   2000
7     47472   1000


def test(y):
    if y == 47472:
        z=1000
    elif y < 1000:
        z=y*2
    else:
        z=2000
    return Z

# I tried to call the above function below
z = test(y)
z

but I don't get the result instead it shows below error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
3
  • We cannot help you if we don't know what is y? Commented Nov 23, 2015 at 9:19
  • That's a NumPy error; you should at least have that as a tag. And you need to show what y is here. From the error, it appears to be a Series, not an integer. Commented Nov 23, 2015 at 9:20
  • y is just a random variable. it is a subset of a main data frame. i just took y variable from another file and trying to create Z. it looks to be an array. Please let me know if you have any solution Commented Nov 23, 2015 at 10:10

1 Answer 1

1

The problem is that you are using a Series in the if statement, such as:

if y == 47472:

assuming that y is part of your DataFrame this will result in a list of booleans:

>>> df['y']==47472
0    False
1    False
2    False
3    False
4    False
5    False
6    False
7     True
Name: y, dtype: bool

Which is not legal, and therefore it suggests you to use a boolean function that returns one boolean such as any(), all(), etc. Instead you should use boolean indexing:

# df is the dataframe with your data
# adding column z
df['z'] = pd.Series(np.zeros(df.shape[0]))
# if y == 47472 then put 1000
df.loc[df['y']==47472, 'z'] = 1000
# filter <1000
df.loc[df['y']<1000, 'z'] = 2*df['y']
# now set rest to 2000 (i.e. ones that do not comply previous 2 conditions)
df.loc[(df['y']>=1000) & (df['y']!=47472),'z'] = 2000

Edit: As commented by EdChum I was performing chained indexing:

df['z'][df['y']<1000] = 2*df['y']

which should be avoided by using loc:

df.loc[df['y']<1000, 'z'] = 2*df['y']
Sign up to request clarification or add additional context in comments.

3 Comments

You're performing chained indexing, please change to df.loc[df['y']==47472, 'z'] = 1000, df.loc[df['y']<1000, 'z'] = 2*df['y'] and df.loc[(df['y']>=1000) & (df['y']!=47472), 'z'] = 2000 respectively
@EdChum thanks for this information, I changed the code.
Thanks all for your help i have created new column. Thanks again

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.