Passing value using variable to if loop python

Question

I am trying to count the number of entries in the dataframe which is above its standard deviation. But getting "ValueError" while passing the value using .std()

cnt=0
value=df['fill'].std()
for row in range (len(df)-1,0,-1):
  if (df.iloc[row,4]>=value):
    cnt=cnt+1
print(cnt)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

The loop process the rows backwards and the column number 4 contains the data to be compared. Don't hesitate to share any better functions or methods to make this easier. Thanks in advance.

Also, I suggest you read about broadcasting. You can do operations like this in pandas much faster without an explicit for loop. — Code-Apprentice
– Code-Apprentice, Commented Oct 10, 2022 at 15:43
Please provide enough code so others can better understand or reproduce the problem. — Community
– Community Bot, Commented Oct 10, 2022 at 16:00

chrslg · Accepted Answer · 2022-10-10 16:00:01Z

Your code is working. Or at least, from what we see, there is no problem. Well, there are some. But none about the error you get

Your loop ignores the index 0. It goes from len(df)-1 to 1.
If you are iterating a dataframe's rows with a loop, then you know you are doing something wrong. There is always a better solution. For loop on dataframe is the ultimate way to loose time. And we are talking 1000x here, not some % of optimization.

In your case,

(df.fill>=value).sum()

gives you what you want without a for loop. Or to be more accurate, without a for loop you wrote yourself. Obviously, there is still a for loop somewhere. In pandas library. But that for loop is written in C, not in python. Python is slow (it is sometimes impressively fast for an interpreter. Still, it is just an interpreter, not at all in the same league as C). The only reason python is fast after all, is because we always manage to have most of the actual work done in C code of libraries.

And anyway, (df.fill>=value).sum() is also easier to write, so even if you don't care about speed...

That being said, your code is slow, it ignores the first row, which I am pretty sure is not wanted, it count backward, which is not bad per se (why not, you might say), but doesn't help at all for simple counting (and is the reason why you end up with the "row 0 is ignored" bug). But it works.

So only reason for the error you get is in your data. Tho I fail to see how it could raise a ValueError (IndexError would be if you are wrong about column 4; TypeError if you have some data that are not numbers)

Thank you pointing out the bug and opening my eyes for not using for loops. It really made a difference

Collectives™ on Stack Overflow

Passing value using variable to if loop python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related