Adding column to Pandas DataFrame based on dynamic indexing condition

Question

I have a dataframe with a column that randomly starts a "count" back at 1. My goal is to produce a new_col that divides my current column by the the last value in a count. See below for an example.

This is my current DataFrame:

Trying to get an output like so:

    col  new_col
0   1.0  0.333
1   2.0  0.667
2   3.0  1.000
3   1.0  0.500
4   2.0  1.000
5   1.0  0.200
6   2.0  0.400
7   3.0  0.600
8   4.0  0.800
9   5.0  1.000
10  1.0  0.333
11  2.0  0.667
12  3.0  1.000

This is what I have tried so far:

df['col_bool'] = pd.DataFrame(df['col']  == 1.0)
idx_lst = [x - 2 for x in df.index[df['col_bool']].tolist()]
idx_lst = idx_lst[1:]

mask = (df['col'] != 1.0)
df_valid = df[mask]
for i in idx_lst:
    df['new_col'] = 1.0 / df_valid.iloc[i]['col']
    df.loc[mask, 'new_col'] = df_valid['col'] / df_valid.iloc[i]['col']

This understandably results in an index error. Maybe I need to make a copy of a DataFrame each time and concat. I believe this would work but I want to ask if I am missing any shortcuts here?

jezrael · Accepted Answer · 2021-06-01 05:15:02Z

7

Try:

df['new_col'] = df['col'].div(df.groupby((df['col'] == 1).cumsum()).transform('last'))

Output:

    col   new_col
0   1.0  0.333333
1   2.0  0.666667
2   3.0  1.000000
3   1.0  0.500000
4   2.0  1.000000
5   1.0  0.200000
6   2.0  0.400000
7   3.0  0.600000
8   4.0  0.800000
9   5.0  1.000000
10  1.0  0.333333
11  2.0  0.666667
12  3.0  1.000000

edited Jun 1, 2021 at 5:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

answered Jun 1, 2021 at 5:12

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

tryingtolearn Over a year ago

Thanks for sharing! Very elegant. I used your suggestion and just needed to select the column name again at the end. df['new_col'] = df['col'].div(df.groupby((df['col'] == 1).cumsum()).transform('last')['col'])

Scott Boston Over a year ago

@tryingtolearn Happy coding! Be safe and stay healthy.

Nk03 · Accepted Answer · 2021-06-01 05:11:44Z

4

You can try:

df['new_col'] = df.groupby((df.col.ne(df.col.shift().add(1))).cumsum())[
    'col'].transform(lambda x: x.div(len(x)))

Or:

df['new_col'] = df.col.div(df.groupby((df.col.ne(df.col.shift().add(1))).cumsum())
           ['col'].transform('count'))

OUTPUT:

    col   new_col
0   1.0  0.333333
1   2.0  0.666667
2   3.0  1.000000
3   1.0  0.500000
4   2.0  1.000000
5   1.0  0.200000
6   2.0  0.400000
7   3.0  0.600000
8   4.0  0.800000
9   5.0  1.000000
10  1.0  0.333333
11  2.0  0.666667
12  3.0  1.000000

edited Jun 1, 2021 at 5:11

answered Jun 1, 2021 at 5:06

Nk03

15k2 gold badges11 silver badges24 bronze badges

Collectives™ on Stack Overflow

Adding column to Pandas DataFrame based on dynamic indexing condition

2 Answers 2

2 Comments

OUTPUT:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

OUTPUT:

Comments

Your Answer

Sign up or log in

Post as a guest

Related