Pandas - create new column based on conditional value of another column

Question

I would like to create a new column df['indexed'] based on the value of the previous row of the column df['col2']. Except, if in row of column df['col2'] is not "x" (in this example a string - date), I would like the set 100 in df['indexed']. So I expect a column "indexed" that begins every time at a value of 100, if df['col2'] is not a "x".

import pandas as pd
d = {'col1': [0.02,0.12,-0.1,0-0.07,0.01,0.02,0.12,-0.1,0-0.07,0.01],
     'col2': ['x','x','x','2021-60-30','x','x','x','x','x','x']}
df = pd.DataFrame(data=d)
df['col1'] = df['col1']+1
df['indexed'] = 0
df['indexed'].iloc[0] = 100 #to set a start

#what i tried:
for index, row in df.iterrows():
    if row['col2'] == 'x':
        df['indexed']= df['col1'] * df['indexed'].shift(1)
    else:
        df['indexed']= 100

I expect:

Please show expected output

Gulzar
– Gulzar

2021-05-09 09:31:30 +00:00
Commented May 9, 2021 at 9:31 — Gulzar
– Gulzar, Commented May 9, 2021 at 9:31

perl · Accepted Answer · 2021-05-09 09:56:19Z

1

You can use where:

df['indexed'] = (df['col1'] * df['col1'].shift(1)).where(df['col2']=='x', 100)
df

Output:


   col1        col2   indexed
0  1.02           x       NaN
1  1.12           x    1.1424
2  0.90           x    1.0080
3  0.93  2021-60-30  100.0000
4  1.01           x    0.9393
5  1.02           x    1.0302
6  1.12           x    1.1424
7  0.90           x    1.0080
8  0.93           x    0.8370
9  1.01           x    0.9393

Update If you want to calculate cumulative product starting from each non-x value in col2:

g = df.groupby(df['col2'].ne('x').cumsum())['col1']
df['indexed'] = g.cumprod() / g.transform('first') * 100

Output:

   col1        col2     indexed
0  1.02           x  100.000000
1  1.12           x  112.000000
2  0.90           x  100.800000
3  0.93  2021-60-30  100.000000
4  1.01           x  101.000000
5  1.02           x  103.020000
6  1.12           x  115.382400
7  0.90           x  103.844160
8  0.93           x   96.575069
9  1.01           x   97.540819

edited May 9, 2021 at 9:56

answered May 9, 2021 at 9:34

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alex Over a year ago

thank you. Based on your solution I found out that I made a logical mistake. here: df['indexed']= df['col1'] * df['indexed'].shift(1). sorry

perl Over a year ago

@Alex Please see the update, hope I got right what you wanted to achieve

Alex Over a year ago

i do not understand the logic behind your solution. can you tell me, how I have to change the code if I would to set the "row-value" of "col1" instead of 100?!

perl Over a year ago

@Alex The logic is that we make groups with each group starting with non-x value in col2, then calculate cumulative product of col1 and rescale to make the first value in the group to be 100. If you want the first value to be that of col1, just use df['indexed'] = g.cumprod() instead of df['indexed'] = g.cumprod() / g.transform('first') * 100

notarealgreal · Accepted Answer · 2021-05-09 09:40:02Z

0

Have you tried the apply method and just using your own function:

def my_funct(row)

    if row['col2'] == 'x':
       row['indexed']= row['col1'] * row['col1'].shift(1)
    else:
       row['indexed']= 100

And then:

df= df.apply(my_funct, axis=1)

answered May 9, 2021 at 9:40

notarealgreal

7061 gold badge19 silver badges32 bronze badges

Collectives™ on Stack Overflow

Pandas - create new column based on conditional value of another column

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related