4

I have a dataframe with a column that randomly starts a "count" back at 1. My goal is to produce a new_col that divides my current column by the the last value in a count. See below for an example.

This is my current DataFrame:

    col
0   1.0
1   2.0
2   3.0
3   1.0
4   2.0
5   1.0
6   2.0
7   3.0
8   4.0
9   5.0
10  1.0
11  2.0
12  3.0

Trying to get an output like so:

    col  new_col
0   1.0  0.333
1   2.0  0.667
2   3.0  1.000
3   1.0  0.500
4   2.0  1.000
5   1.0  0.200
6   2.0  0.400
7   3.0  0.600
8   4.0  0.800
9   5.0  1.000
10  1.0  0.333
11  2.0  0.667
12  3.0  1.000

This is what I have tried so far:

df['col_bool'] = pd.DataFrame(df['col']  == 1.0)
idx_lst = [x - 2 for x in df.index[df['col_bool']].tolist()]
idx_lst = idx_lst[1:]

mask = (df['col'] != 1.0)
df_valid = df[mask]
for i in idx_lst:
    df['new_col'] = 1.0 / df_valid.iloc[i]['col']
    df.loc[mask, 'new_col'] = df_valid['col'] / df_valid.iloc[i]['col']

This understandably results in an index error. Maybe I need to make a copy of a DataFrame each time and concat. I believe this would work but I want to ask if I am missing any shortcuts here?

0

2 Answers 2

7

Try:

df['new_col'] = df['col'].div(df.groupby((df['col'] == 1).cumsum()).transform('last'))

Output:

    col   new_col
0   1.0  0.333333
1   2.0  0.666667
2   3.0  1.000000
3   1.0  0.500000
4   2.0  1.000000
5   1.0  0.200000
6   2.0  0.400000
7   3.0  0.600000
8   4.0  0.800000
9   5.0  1.000000
10  1.0  0.333333
11  2.0  0.666667
12  3.0  1.000000
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for sharing! Very elegant. I used your suggestion and just needed to select the column name again at the end. df['new_col'] = df['col'].div(df.groupby((df['col'] == 1).cumsum()).transform('last')['col'])
@tryingtolearn Happy coding! Be safe and stay healthy.
4

You can try:

df['new_col'] = df.groupby((df.col.ne(df.col.shift().add(1))).cumsum())[
    'col'].transform(lambda x: x.div(len(x)))

Or:

df['new_col'] = df.col.div(df.groupby((df.col.ne(df.col.shift().add(1))).cumsum())
           ['col'].transform('count'))

OUTPUT:

    col   new_col
0   1.0  0.333333
1   2.0  0.666667
2   3.0  1.000000
3   1.0  0.500000
4   2.0  1.000000
5   1.0  0.200000
6   2.0  0.400000
7   3.0  0.600000
8   4.0  0.800000
9   5.0  1.000000
10  1.0  0.333333
11  2.0  0.666667
12  3.0  1.000000

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.