How to speed up conditional statement in python

Question

I am trying to generate a new column in a pandas dataframe by loop over >100,000 rows and setting the value of the row conditional on an already existing row.

The current dataframe is a dummy but works as an example. My current code is:

df=pd.DataFrame({'IT100':[5,5,-0.001371,0.0002095,-5,0,-5,5,5],
'ET110':[0.008187884,0.008285232,0.00838258,0.008479928,1,1,1,1,1]})

# if charging set to 1, if discharging set to -1.
# if -1 < IT100 < 1 then set CD to previous cells value
# Charging is defined as IT100 > 1 and Discharge is defined as IT100 < -1


 def CD(dataFrame):


    for x in range(0,len(dataFrame.index)):
     
        current = dataFrame.loc[x,"IT100"]

        if x == 0:
            if dataFrame.loc[x+5,"IT100"] > -1:
                dataFrame.loc[x,"CD"] = 1
            else:
                dataFrame.loc[x,"CD"] = -1
        else:
            if current > 1:
                dataFrame.loc[x,"CD"] = 1
            elif current < -1:
                dataFrame.loc[x,"CD"] = -1
            else:
                dataFrame.loc[x,"CD"] = dataFrame.loc[x-1,"CD"]

Using if/Else loops is extremely slow. I see that people have suggested to use np.select() or pd.apply(), but I do not know if this will work for my example. I need to be able to index the column because one of my conditions is to set the value of the new column to the value of the previous cell in the column of interest.

Thanks for any help!

Please post an example dataframe, just the columns of interest and, say, a dozen rows. — tdelaney
– tdelaney, Commented Dec 28, 2020 at 23:31
Have a look at How to make good pandas examples and provide a minimal reproducible example so that we cna better understand how to help — G. Anderson
– G. Anderson, Commented Dec 28, 2020 at 23:32
Any time you explicitly iterate through your data frame rows, you're slowing down the processing. Please work through more PANDAS tutorials to get familiar with vectored operation. In this case, you can use the shift method to handle the x-1 indexing as a vectored operation. — Prune
– Prune, Commented Dec 28, 2020 at 23:33
In general, pandas doesn't have a way to handle recursive definitions, but in this case as you're not really doing any modifications on the previous values, ffill works here — Asish M.
– Asish M., Commented Dec 28, 2020 at 23:46

EMiller · Accepted Answer · 2020-12-28 23:48:32Z

3

@Grajdeanu Alex is right, the loop is slowing you down more than whatever you're doing inside of it. With pandas, a loop is usually the slowest choice. Try this:

import pandas as pd
import numpy as np
df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})
df['CD'] = np.nan
#lower saturation
df.loc[df['IT100'] < -1,['CD']] = -1
#upper saturation
df.loc[df['IT100'] > 1,['CD']] = 1
#fill forward
df['CD'] = df['CD'].ffill()
# setting the first row equal to the fifth
df.loc[0,['CD']] = df.loc[5,['CD']]

using ffill will use the last valid value to fill in subsequent nan values (-1 < x < 1)

answered Dec 28, 2020 at 23:48

EMiller

8391 gold badge7 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Paul H Over a year ago

the OP's dataframe has over 100k rows, so I think you're misinterpreting the significance of the x+5

Asish M. Over a year ago

unless they miswrote - their code only looks at x+5 if x==0 (x being the index here which "should" be unique)

NKUP Over a year ago

For the X==0 cell, I wanted to check what the value was for the 5th or so cell because I knew my signal would be stable by then.

Jack Moody · Accepted Answer · 2020-12-29 00:30:09Z

0

Similar to EMiller's answer, you could also use clip.

import pandas as pd
import numpy as np
df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})

df['CD'] = df['IT100'].clip(-1, 1)
df.loc[~df['CD'].isin([-1, 1]), 'CD'] = np.nan
df['CD'] = df['CD'].ffill()
df.loc[0,['CD']] = df.loc[5,['CD']]

answered Dec 29, 2020 at 0:30

Jack Moody

1,7713 gold badges27 silver badges39 bronze badges

Comments

Asish M. · Accepted Answer · 2020-12-29 00:30:16Z

0

As an alternate to @EMiller's answer

In [213]: df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})

In [214]: df
Out[214]:
   IT100
0   0.00
1 -50.00
2 -20.00
3  -0.50
4  -0.25
5  -0.50
6 -10.00
7   5.00
8   0.50

In [215]: df['CD'] = pd.Series(np.where(df['IT100'].between(-1, 1), np.nan, df['IT100'].clip(-1, 1))).ffill()


In [217]: df.loc[0, 'CD'] = 1 if df.loc[5, 'IT100'] > -1 else -1

In [218]: df
Out[218]:
   IT100   CD
0   0.00  1.0
1 -50.00 -1.0
2 -20.00 -1.0
3  -0.50 -1.0
4  -0.25 -1.0
5  -0.50 -1.0
6 -10.00 -1.0
7   5.00  1.0
8   0.50  1.0

answered Dec 29, 2020 at 0:30

Asish M.

2,6571 gold badge19 silver badges34 bronze badges

Collectives™ on Stack Overflow

How to speed up conditional statement in python

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related