d = {'ID': ['H1', 'H1', 'H2', 'H2', 'H3', 'H3'], 'Year': ['2012', '2013', '2014', '2013', '2014', '2015'], 'Unit': [5, 10, 15, 7, 15, 20]}
df_input= pd.DataFrame(data=d)
df_input
- Group By the above df_input and wanted to get 'lag' and 'lag_u' columns. 'lag' is the number of row sequence at 'ID' and 'Year' group by level.
- 'lag_u' is just get the first Unit value at 'ID' and 'Year' group by level.
Expected Output:
d = {'ID': ['H1', 'H1', 'H2', 'H2', 'H3', 'H3'], 'Year': ['2012', '2013', '2014', '2013', '2014', '2015'], 'Unit': [5, 10, 15, 7, 15, 20], 'lag': [0, 1, 2, 0, 1, 2], 'lag_u': [5, 5, 5, 7, 7, 7]}
df_output= pd.DataFrame(data=d)
df_output