1

I have dataframe like this where index contains the start_period/end_period.

enter image description here

And i want to create range from dataframe index and output like this:

|    range_date         |       due_date      |
| ------------------    | --------------------|
| 2021-07-30 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-01 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-02 00:00:00   | 2021-07-30 00:00:00 |
| 2021-08-29 00:00:00   | 2021-08-30 00:00:00 |
| 2021-08-30 00:00:00   | 2021-08-30 00:00:00 |
| 2021-09-01 00:00:00   | 2021-08-30 00:00:00 |
| 2021-09-28 00:00:00   | 2021-09-30 00:00:00 |
| 2021-09-29 00:00:00   | 2021-09-30 00:00:00 |
| 2021-09-30 00:00:00   | 2021-09-30 00:00:00 |

Clarification: If you check the first row where the index contains the range start_period/ end_period, so i want to convert that range into per day row and its value would be the column value:

range_date      due_date
2021-07-30      2021-07-30
2021-08-01      2021-07-30
2021-08-02      2021-07-30

same goes for other rows

1
  • 1
    Would you clarify the calculations for those who don't get the point? ;) Commented Sep 15, 2021 at 10:53

1 Answer 1

1
df = pd.DataFrame(index=['2021-07-30 00:00:00/2021-08-02 00:00:00',
                                  '2021-08-29 00:00:00/2021-09-01 00:00:00',
                                  '2021-09-28 00:00:00/2021-10-01 00:00:00'],
                  data={'due_date/modified_due_date':['2021-07-30 00:00:00', '2021-08-30 00:00:00', '2021-09-30 00:00:00']})

df['index'] = df.index
df[['start', 'end']] = df['index'].str.split('/', expand=True)
df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])

dates = []
for index, row in df.iterrows():
    dates.extend(pd.date_range(row['start'], row['end']))

df_dates = pd.DataFrame(data={'range_date': dates})
df_dates = df_dates.merge(df[['start', 'due_date/modified_due_date']], how='left', left_on=['range_date'], right_on=['start'])[['range_date', 'due_date/modified_due_date']]
df_dates['due_date/modified_due_date'] = df_dates['due_date/modified_due_date'].ffill()
df_dates.rename(columns={'due_date/modified_due_date': 'due_date'}, inplace=True)

OUTPUT:

range_date             due_date
0  2021-07-30  2021-07-30 00:00:00
1  2021-07-31  2021-07-30 00:00:00
2  2021-08-01  2021-07-30 00:00:00
3  2021-08-02  2021-07-30 00:00:00
4  2021-08-29  2021-08-30 00:00:00
5  2021-08-30  2021-08-30 00:00:00
6  2021-08-31  2021-08-30 00:00:00
7  2021-09-01  2021-08-30 00:00:00
8  2021-09-28  2021-09-30 00:00:00
9  2021-09-29  2021-09-30 00:00:00
10 2021-09-30  2021-09-30 00:00:00
11 2021-10-01  2021-09-30 00:00:00
Sign up to request clarification or add additional context in comments.

1 Comment

@Kaushalpanchal glad that it worked for u.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.