0

I would like to create a new column df['indexed'] based on the value of the previous row of the column df['col2']. Except, if in row of column df['col2'] is not "x" (in this example a string - date), I would like the set 100 in df['indexed']. So I expect a column "indexed" that begins every time at a value of 100, if df['col2'] is not a "x".

import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01,0.02,0.12,-0.1,0-0.07,0.01],
     'col2': ['x','x','x','2021-60-30','x','x','x','x','x','x']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1']+1
df['indexed'] = 0
df['indexed'].iloc[0] = 100 #to set a start

#what i tried:
for index, row in df.iterrows():
    if row['col2'] == 'x':
        df['indexed']= df['col1'] * df['indexed'].shift(1)
    else:
        df['indexed']= 100

I expect:

enter image description here

1
  • Please show expected output Commented May 9, 2021 at 9:31

2 Answers 2

1

You can use where:

df['indexed'] = (df['col1'] * df['col1'].shift(1)).where(df['col2']=='x', 100)
df

Output:


   col1        col2   indexed
0  1.02           x       NaN
1  1.12           x    1.1424
2  0.90           x    1.0080
3  0.93  2021-60-30  100.0000
4  1.01           x    0.9393
5  1.02           x    1.0302
6  1.12           x    1.1424
7  0.90           x    1.0080
8  0.93           x    0.8370
9  1.01           x    0.9393

Update If you want to calculate cumulative product starting from each non-x value in col2:

g = df.groupby(df['col2'].ne('x').cumsum())['col1']
df['indexed'] = g.cumprod() / g.transform('first') * 100

Output:

   col1        col2     indexed
0  1.02           x  100.000000
1  1.12           x  112.000000
2  0.90           x  100.800000
3  0.93  2021-60-30  100.000000
4  1.01           x  101.000000
5  1.02           x  103.020000
6  1.12           x  115.382400
7  0.90           x  103.844160
8  0.93           x   96.575069
9  1.01           x   97.540819
Sign up to request clarification or add additional context in comments.

4 Comments

thank you. Based on your solution I found out that I made a logical mistake. here: df['indexed']= df['col1'] * df['indexed'].shift(1). sorry
@Alex Please see the update, hope I got right what you wanted to achieve
i do not understand the logic behind your solution. can you tell me, how I have to change the code if I would to set the "row-value" of "col1" instead of 100?!
@Alex The logic is that we make groups with each group starting with non-x value in col2, then calculate cumulative product of col1 and rescale to make the first value in the group to be 100. If you want the first value to be that of col1, just use df['indexed'] = g.cumprod() instead of df['indexed'] = g.cumprod() / g.transform('first') * 100
0

Have you tried the apply method and just using your own function:

def my_funct(row)

    if row['col2'] == 'x':
       row['indexed']= row['col1'] * row['col1'].shift(1)
    else:
       row['indexed']= 100

And then:

df= df.apply(my_funct, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.