1

I have the following DataFrame:

   date_start          date_end
0  2023-01-01 16:00:00 2023-01-01 17:00:00
1  2023-01-02 16:00:00 2023-01-02 17:00:00
2  2023-01-03 16:00:00 2023-01-03 17:00:00
3  2023-01-04 17:00:00 2023-01-04 19:00:00
4  NaN                 NaN

and I want to create a new DataFrame which will contain values starting from the date_start and ending at the date_end of each row. So for the first row by using the code below:

new_df = pd.Series(pd.date_range(start=df['date_start'][0], end=df['date_end'][0], freq= '15min'))

I get the following:

0   2023-01-01 16:00:00
1   2023-01-01 16:15:00
2   2023-01-01 16:30:00
3   2023-01-01 16:45:00
4   2023-01-01 17:00:00

How can I get the same result for all the rows of the df combined in a new df?

2 Answers 2

1

You can use a list comprehension and concat:

out = pd.concat([pd.DataFrame({'date': pd.date_range(start=start, end=end,
                                                     freq='15min')})
                  for start, end in zip(df['date_start'], df['date_end'])],
                ignore_index=True))

Output:

                  date
0  2023-01-01 16:00:00
1  2023-01-01 16:15:00
2  2023-01-01 16:30:00
3  2023-01-01 16:45:00
4  2023-01-01 17:00:00
5  2023-01-02 16:00:00
6  2023-01-02 16:15:00
7  2023-01-02 16:30:00
8  2023-01-02 16:45:00
9  2023-01-02 17:00:00
10 2023-01-03 16:00:00
11 2023-01-03 16:15:00
12 2023-01-03 16:30:00
13 2023-01-03 16:45:00
14 2023-01-03 17:00:00
15 2023-01-04 17:00:00
16 2023-01-04 17:15:00
17 2023-01-04 17:30:00
18 2023-01-04 17:45:00
19 2023-01-04 18:00:00
20 2023-01-04 18:15:00
21 2023-01-04 18:30:00
22 2023-01-04 18:45:00
23 2023-01-04 19:00:00

handling NAs:

out = pd.concat([pd.DataFrame({'date': pd.date_range(start=start, end=end,
                                                     freq='15min')})
                  for start, end in zip(df['date_start'], df['date_end'])
                  if pd.notna(start) and pd.notna(end)
                ],
                ignore_index=True)
Sign up to request clarification or add additional context in comments.

2 Comments

@thanks mozway! In case there are NaNs, I am getting the following error: ValueError: Neither start` nor end can be NaT`. How do I ignore the NaNs?
You can first run df2 = df.dropna(['date_start', 'date_end']) and use df2 in the list comprehension. Or add if pd.notna(start) and pd.notna(end) as test at the end of the list comprehension
1

Adding to the previous answer that date_range has a to_series() method and that you could proceed like this as well:

pd.concat(
  [
    pd.date_range(start=row['date_start'], end=row['date_end'], freq= '15min').to_series()
    for _, row in df.iterrows()
  ], ignore_index=True
)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.