1

I encountered a problem of how to duplicate rows with loop function in Python. I have a dataset like this(this is a pandas data frame):

    userId  period  Date    
0   41851   4       1/4/2015    
1   13575   1       1/4/2015    

And I want to duplicate the first row 3 times, every time for the original row, the period column need to minus 1. until the period for the original is 1. and also every time when I duplicate it, I want to add 1 month to the date. So the result would be like this:

    userId  period  Date    
0   41851   1       1/4/2015
1   41851   1       2/4/2015
2   41851   1       3/4/2015
3   41851   1       4/4/2015    
4   13575   1       1/4/2015    

Does someone know how to do that? Thanks!

1 Answer 1

1

Idea is repeat rows by Index.repeat and DataFrame.loc, then add days by GroupBy.cumcount with this solution and last if necessary change format of datetimes by Series.dt.strftime:

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

def year(dates):
    "Return an array of the years given an array of datetime64s"
    return dates.astype('M8[Y]').astype('i8') + 1970

def month(dates):
    "Return an array of the months given an array of datetime64s"
    return dates.astype('M8[M]').astype('i8') % 12 + 1

def day(dates):
    "Return an array of the days of the month given an array of datetime64s"
    return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1

df['Date'] = pd.to_datetime(df['Date'])

df1 = df.loc[df.index.repeat(df['period'])]
g = df1.groupby(level=0).cumcount()

start = df1['Date'].values
df1['Date'] = combine64(year(start), months=month(start) + g, 
                           days=day(start))
df1['period'] = 1
df1 = df1.reset_index(drop=True)

df1['Date'] = df1['Date'].dt.strftime('%m/%d/%Y')
print (df1)
   userId  period        Date
0   41851       1  01/04/2015
1   41851       1  02/04/2015
2   41851       1  03/04/2015
3   41851       1  04/04/2015
4   13575       1  01/04/2015
Sign up to request clarification or add additional context in comments.

1 Comment

It worked!! Thank you so much! This is really impressive.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.