Problem loop and computation between different columns on a DataFrame

Question

Here is my problem,

I have this DataFrame:

df=pd.DataFrame({'Date':['2017-05-19','2017-05-22','2017-05-23','2017-05-24','2017-05-25','2017-05-26','2017-05-29'],
                 'A':[153,152,153,155,153,154,155],
                 'B':[139,137,141,141.5,141.8,142.1,142.06],})

df['Date']=pd.to_datetime(df['Date'])
df = df.set_index('Date')

My goal was to create another DataFrame named df_end where I will have a column named "total" corresponding to the total number of column of df (which is 2) and two column A and B that will take the value "1" if the 1 day rolling mean is > the 2 days rolling mean

Here is the code that I wrote :

def test(dataframe):
    df_init=(dataframe*0).assign(total=0)
    df_end=(dataframe*0).assign(total=0)
    
    df_init.iloc[0,df_init.columns.get_loc('total')]=2
    df_end.iloc[0,df_end.columns.get_loc('total')]=2
    
    rolling_2D=dataframe.rolling(2).mean().fillna(0)
    rolling_1D=dataframe.rolling(1).mean().fillna(0)
    
    calendar=pd.Series(df_init.index)[1:]
    
    for date in calendar:
        for cols in dataframe:
            prev_date=df_init.index[df_init.index<date][-1] 
            df_init.loc[date,:]=df_end.loc[prev_date,:] #df_init takes the value of df_end on the previous date
            
            
            M=2 #number of columns different from the column 'total'
            count = (df_init.loc[date,dataframe.columns]!=0).sum() #this will be used to compute the weight
            
            #Weight
            if count != M:
                weight = 1/(M-count)
            else:
                weight = 1
            
            #Repartition between different columns 
            repartition = df_init.loc[date,cols]+df_init.loc[date,'total']*weight
            
            #Conditions
            if rolling_1D.loc[date,cols] > rolling_2D.loc[date,cols] and df_init.loc[date,cols]==0:
                df_end.loc[date,cols]=repartition
                df_end.loc[date,'total']=df_init.loc[date,'total'] - df_end.loc[date,cols]
            
            elif rolling_1D.loc[date,cols] > rolling_2D.loc[date,cols] and df_init.loc[date,cols]!=0:
                df_end.loc[date,cols]=df_init.loc[date,cols]
                df_end.loc[date,'total']=df_init.loc[date,'total']
            
            else:
                df_end.loc[date,cols]=0
                df_end.loc[date,'total']=df_init.loc[date,cols]+df_init.loc[date,'total']
    
    return df_end

When I return df_end this is what I have:

            A     B    total
Date            
2017-05-19  0.0  0.0    2.0
2017-05-22  0.0  0.0    2.0
2017-05-23  1.0  1.0    1.0
2017-05-24  1.0  1.0    1.0
2017-05-25  0.0  1.0    1.0
2017-05-26  1.0  1.0    1.0
2017-05-29  1.0  0.0    2.0

As you can see, my program take into account the column B only and I don't know where is my mistake.

The result that I am looking for is:

            A     B    total
Date            
2017-05-19  0.0  0.0    2.0
2017-05-22  0.0  0.0    2.0
2017-05-23  1.0  1.0    0.0
2017-05-24  1.0  1.0    0.0
2017-05-25  0.0  1.0    1.0
2017-05-26  1.0  1.0    0.0
2017-05-29  1.0  0.0    1.0

I know that I am almost done with this code and I'm stuck with this problem.

Do you have any idea where my problem could be?

Thank you so much for the help

I'm not sure I undersand the end result you're trying to get. Each column should be 1 if the rolling mean for the last 2 rows is larger than the rolling mean for the last 1 row (which is the row itself)? — Roy2012
– Roy2012, Commented Jun 25, 2020 at 8:54
Hi @Roy2012, thank you for the answer. Let's say that we have 2 units (which are on the column 'total' at the beginning) and we put 1 unit on one column if the rolling mean of the last 1 row (which is the row itself) is larger than the rolling mean of the last 2 rows, that's why the column total increase or decrease if the condition is true or not. I thank you for looking at my post and don't hesitate to tell if something is not clear — Kben59
– Kben59, Commented Jun 25, 2020 at 9:03

Roy2012 · Accepted Answer · 2020-06-25 09:11:24Z

1

Here's a fairly simple way to do that:

new_df = (df.rolling(1).mean() > df.rolling(2).mean()).astype(int)
new_df["total"] = 2 - new_df.sum(axis=1)
print(new_df)

The output is:

            A  B  total
Date                   
2017-05-19  0  0      2
2017-05-22  0  0      2
2017-05-23  1  1      0
2017-05-24  1  1      0
2017-05-25  0  1      1
2017-05-26  1  1      0
2017-05-29  1  0      1

answered Jun 25, 2020 at 9:11

Roy2012

12.7k3 gold badges28 silver badges38 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kben59 Over a year ago

I'm always doing things more complicated than it should be lol. Thank you so much for the help it's perfect this way!

Collectives™ on Stack Overflow

Problem loop and computation between different columns on a DataFrame

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related