Python adding row into dataframe using while loop

Question

I have a dataset like this:

    user_id lapsed_date start_date  end_date
0   A123    2020-01-02  2019-01-02  2019-02-02
1   A123    2020-01-02  2019-02-02  2019-03-02
2   B456    2019-10-01  2019-08-01  2019-09-01
3   B456    2019-10-01  2019-09-01  2019-10-01

generated by this code:

from pandas import DataFrame

sample = {'user_id': ['A123','A123','B456','B456'],
        'lapsed_date': ['2020-01-02', '2020-01-02', '2019-10-01', '2019-10-01'],
        'start_date' : ['2019-01-02', '2019-02-02', '2019-08-01', '2019-09-01'],
        'end_date' : ['2019-02-02', '2019-03-02', '2019-09-01', '2019-10-01']
        }

df = pd.DataFrame(sample,columns= ['user_id', 'lapsed_date', 'start_date', 'end_date'])

df['lapsed_date'] = pd.to_datetime(df['lapsed_date'])
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date'])

I'm trying to write a function to achieve this:

    user_id lapsed_date start_date  end_date
0   A123    2020-01-02  2019-01-02  2019-02-02
1   A123    2020-01-02  2019-02-02  2019-03-02
2   A123    2020-01-02  2019-03-02  2019-04-02
3   A123    2020-01-02  2019-04-02  2019-05-02
4   A123    2020-01-02  2019-05-02  2019-06-02
5   A123    2020-01-02  2019-06-02  2019-07-02
6   A123    2020-01-02  2019-07-02  2019-08-02
7   A123    2020-01-02  2019-08-02  2019-09-02
8   A123    2020-01-02  2019-09-02  2019-10-02
9   A123    2020-01-02  2019-10-02  2019-11-02
10  A123    2020-01-02  2019-11-02  2019-12-02
11  A123    2020-01-02  2019-12-02  2020-01-02
12  B456    2019-10-01  2019-08-01  2019-09-01
13  B456    2019-10-01  2019-09-01  2019-10-01

Essentially the function should keep adding row, for each user_id while the max(end_date) is less than or equal to lapsed_date. The newly added row will take previous row's end_date as start_date, and previous row's end_date + 1 month as end_date.

I have generated this function below.

def add_row(x):
    while x['end_date'].max() < x['lapsed_date'].max():
        next_month = x['end_date'].max() + pd.DateOffset(months=1)
        last_row = x.iloc[-1]
        last_row['start_date'] = x['end_date'].max()
        last_row['end_date'] = next_month
        return x.append(last_row)
    return x

It works with all the logic above, except the while loop doesn't work. So I have to apply this function using this apply command manually 10 times:

df = df.groupby('user_id').apply(add_row).reset_index(drop = True)

I'm not really sure what I did wrong with the while loop there. Any advice would be highly appreciated!

x is meant to be the dataframe. Hence I tried doing df.groupby('user_id').apply(add_row). I'm still fairly new to Python :) — catherine
– catherine, Commented Dec 19, 2019 at 23:51

PacketLoss · Accepted Answer · 2019-12-20 00:36:06Z

1

So there are a few reasons your loop did not work, I will explain them as we go!

def add_row(x):
    while x['end_date'].max() < x['lapsed_date'].max():
        next_month = x['end_date'].max() + pd.DateOffset(months=1)
        last_row = x.iloc[-1]
        last_row['start_date'] = x['end_date'].max()
        last_row['end_date'] = next_month
        return x.append(last_row)
    return x

In the above, you call return which returns the result to the code that called the function. This essentially stops your loop from iterating multiple times and returns the result of the first append.

return x.append(last_row) Another caveat here is that dataframe.append() does not actually append to the dataframe, you need to call x = x.append(last_row)

Pandas Append

Secondly, I noted that it may be required to do this over multiple, unique user_id rows. Due to this, in the code below, I have split the dataframe into multiple frames, dictated by the total unique user_id's stored in the frame.

Here is how you can get this to work;

import pandas as pd
from pandas import DataFrame

def add_row(df):

    while df['end_date'].max() < df['lapsed_date'].max():

        new_row = {'user_id': df['user_id'][0],
                   'lapsed_date': df['lapsed_date'].max(),
                   'start_date': df['end_date'].max(),
                   'end_date': df['end_date'].max() + pd.DateOffset(months=1),
                   }

        df = df.append(new_row, ignore_index = True)

    return df ## Note the return is called OUTSIDE of the while loop, ensuring only the final result is returned.


sample = {'user_id': ['A123','A123','B456','B456'],
        'lapsed_date': ['2020-01-02', '2020-01-02', '2019-10-01', '2019-10-01'],
        'start_date' : ['2019-01-02', '2019-02-02', '2019-08-01', '2019-09-01'],
        'end_date' : ['2019-02-02', '2019-03-02', '2019-09-01', '2019-10-01']
        }

df = pd.DataFrame(sample,columns= ['user_id', 'lapsed_date', 'start_date', 'end_date'])

df['lapsed_date'] = pd.to_datetime(df['lapsed_date'])
df['start_date'] = pd.to_datetime(df['start_date'])
df['end_date'] = pd.to_datetime(df['end_date']) 


ids = df['user_id'].unique()

g = df.groupby(['user_id'])

result = pd.DataFrame(columns= ['user_id', 'lapsed_date', 'start_date', 'end_date'])

for i in ids:
    group = g.get_group(i)
    result = result.append(add_row(group), ignore_index=True)


print(result)

Split the frames based on unique user id's
Create empty data frame to store result in under result
Iterate over all user_id's
Run the same while loop, ensuring that df is updated with the append rows
Return the result and print

Hope this helps!

answered Dec 20, 2019 at 0:36

PacketLoss

5,7661 gold badge12 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

catherine Over a year ago

Thank you so much for thorough explanation! I'm going to try this!

AMC Over a year ago

It may be better to concat in order to create the result DataFrame than repeatedly append.

AMC Over a year ago

Also that for loop is strange. I’m pretty sure you can just iterate over the result of groupby and get the ids and groups that way.

Collectives™ on Stack Overflow

Python adding row into dataframe using while loop

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related