2

I am trying to generate a new column in a pandas dataframe by loop over >100,000 rows and setting the value of the row conditional on an already existing row.

The current dataframe is a dummy but works as an example. My current code is:

df=pd.DataFrame({'IT100':[5,5,-0.001371,0.0002095,-5,0,-5,5,5],
'ET110':[0.008187884,0.008285232,0.00838258,0.008479928,1,1,1,1,1]})

# if charging set to 1, if discharging set to -1.
# if -1 < IT100 < 1 then set CD to previous cells value
# Charging is defined as IT100 > 1 and Discharge is defined as IT100 < -1


 def CD(dataFrame):


    for x in range(0,len(dataFrame.index)):
     
        current = dataFrame.loc[x,"IT100"]

        if x == 0:
            if dataFrame.loc[x+5,"IT100"] > -1:
                dataFrame.loc[x,"CD"] = 1
            else:
                dataFrame.loc[x,"CD"] = -1
        else:
            if current > 1:
                dataFrame.loc[x,"CD"] = 1
            elif current < -1:
                dataFrame.loc[x,"CD"] = -1
            else:
                dataFrame.loc[x,"CD"] = dataFrame.loc[x-1,"CD"]

Using if/Else loops is extremely slow. I see that people have suggested to use np.select() or pd.apply(), but I do not know if this will work for my example. I need to be able to index the column because one of my conditions is to set the value of the new column to the value of the previous cell in the column of interest.

Thanks for any help!

7
  • 2
    It's not the if/else part that's slow. Just FYI Commented Dec 28, 2020 at 23:23
  • Please post an example dataframe, just the columns of interest and, say, a dozen rows. Commented Dec 28, 2020 at 23:31
  • 2
    Have a look at How to make good pandas examples and provide a minimal reproducible example so that we cna better understand how to help Commented Dec 28, 2020 at 23:32
  • Any time you explicitly iterate through your data frame rows, you're slowing down the processing. Please work through more PANDAS tutorials to get familiar with vectored operation. In this case, you can use the shift method to handle the x-1 indexing as a vectored operation. Commented Dec 28, 2020 at 23:33
  • In general, pandas doesn't have a way to handle recursive definitions, but in this case as you're not really doing any modifications on the previous values, ffill works here Commented Dec 28, 2020 at 23:46

3 Answers 3

3

@Grajdeanu Alex is right, the loop is slowing you down more than whatever you're doing inside of it. With pandas, a loop is usually the slowest choice. Try this:

import pandas as pd
import numpy as np
df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})
df['CD'] = np.nan
#lower saturation
df.loc[df['IT100'] < -1,['CD']] = -1
#upper saturation
df.loc[df['IT100'] > 1,['CD']] = 1
#fill forward
df['CD'] = df['CD'].ffill()
# setting the first row equal to the fifth
df.loc[0,['CD']] = df.loc[5,['CD']]

using ffill will use the last valid value to fill in subsequent nan values (-1 < x < 1)

Sign up to request clarification or add additional context in comments.

3 Comments

the OP's dataframe has over 100k rows, so I think you're misinterpreting the significance of the x+5
unless they miswrote - their code only looks at x+5 if x==0 (x being the index here which "should" be unique)
For the X==0 cell, I wanted to check what the value was for the 5th or so cell because I knew my signal would be stable by then.
0

Similar to EMiller's answer, you could also use clip.

import pandas as pd
import numpy as np
df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})

df['CD'] = df['IT100'].clip(-1, 1)
df.loc[~df['CD'].isin([-1, 1]), 'CD'] = np.nan
df['CD'] = df['CD'].ffill()
df.loc[0,['CD']] = df.loc[5,['CD']]

Comments

0

As an alternate to @EMiller's answer

In [213]: df = pd.DataFrame({'IT100':[0,-50,-20,-0.5,-0.25,-0.5,-10,5,0.5]})

In [214]: df
Out[214]:
   IT100
0   0.00
1 -50.00
2 -20.00
3  -0.50
4  -0.25
5  -0.50
6 -10.00
7   5.00
8   0.50

In [215]: df['CD'] = pd.Series(np.where(df['IT100'].between(-1, 1), np.nan, df['IT100'].clip(-1, 1))).ffill()


In [217]: df.loc[0, 'CD'] = 1 if df.loc[5, 'IT100'] > -1 else -1

In [218]: df
Out[218]:
   IT100   CD
0   0.00  1.0
1 -50.00 -1.0
2 -20.00 -1.0
3  -0.50 -1.0
4  -0.25 -1.0
5  -0.50 -1.0
6 -10.00 -1.0
7   5.00  1.0
8   0.50  1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.