0

I am trying to count the number of entries in the dataframe which is above its standard deviation. But getting "ValueError" while passing the value using .std()

cnt=0
value=df['fill'].std()
for row in range (len(df)-1,0,-1):
  if (df.iloc[row,4]>=value):
    cnt=cnt+1
print(cnt)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The loop process the rows backwards and the column number 4 contains the data to be compared. Don't hesitate to share any better functions or methods to make this easier. Thanks in advance.

3
  • 1
    Please edit your question to show the full error message. Commented Oct 10, 2022 at 15:43
  • Also, I suggest you read about broadcasting. You can do operations like this in pandas much faster without an explicit for loop. Commented Oct 10, 2022 at 15:43
  • Please provide enough code so others can better understand or reproduce the problem. Commented Oct 10, 2022 at 16:00

1 Answer 1

1

Your code is working. Or at least, from what we see, there is no problem. Well, there are some. But none about the error you get

  • Your loop ignores the index 0. It goes from len(df)-1 to 1.
  • If you are iterating a dataframe's rows with a loop, then you know you are doing something wrong. There is always a better solution. For loop on dataframe is the ultimate way to loose time. And we are talking 1000x here, not some % of optimization.

In your case,

(df.fill>=value).sum()

gives you what you want without a for loop. Or to be more accurate, without a for loop you wrote yourself. Obviously, there is still a for loop somewhere. In pandas library. But that for loop is written in C, not in python. Python is slow (it is sometimes impressively fast for an interpreter. Still, it is just an interpreter, not at all in the same league as C). The only reason python is fast after all, is because we always manage to have most of the actual work done in C code of libraries.

And anyway, (df.fill>=value).sum() is also easier to write, so even if you don't care about speed...

That being said, your code is slow, it ignores the first row, which I am pretty sure is not wanted, it count backward, which is not bad per se (why not, you might say), but doesn't help at all for simple counting (and is the reason why you end up with the "row 0 is ignored" bug). But it works.

So only reason for the error you get is in your data. Tho I fail to see how it could raise a ValueError (IndexError would be if you are wrong about column 4; TypeError if you have some data that are not numbers)

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you pointing out the bug and opening my eyes for not using for loops. It really made a difference

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.