Create DataFrame rows from date range unnesting field

Question

I have the following Pandas DataFrame:

ID       start_date        end_date        codes         type
1        2019-01-01       2019-01-05      [x, y]          A
2        2019-01-01       2019-01-05      [x, y, z]       B

What I want to do is to generate the same number of rows as the range between two dates, for each code. The output will be this:

ID          date              codes        type
1        2019-01-01            x            A
1        2019-01-02            x            A
1        2019-01-03            x            A
1        2019-01-04            x            A
1        2019-01-05            x            A
1        2019-01-01            y            A
1        2019-01-02            y            A
1        2019-01-03            y            A
1        2019-01-04            y            A
1        2019-01-05            y            A
2        2019-01-01            x            B
2        2019-01-02            x            B
.....

Thank you very much!

ansev · Accepted Answer · 2020-01-22 16:49:45Z

2

Pandas > 0.25.0

#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
            .set_index('date')
            .groupby(['ID','type'])
            .resample('D').ffill()
            .drop(columns = 'variable')
            .explode('codes')
            .reset_index(level=[0,1],drop=True)
            .sort_values(['ID','type','codes'])
            .reset_index()
            .reindex(columns = ['ID','date','codes','type'])
         )
print(new_df)

Pandas < 0.25.0

#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
            .set_index('date')
            .groupby(['ID','type'])
            .resample('D').ffill()
            .drop(columns = 'variable'))

new_df = (new_df.reindex(new_df.index.repeat(new_df.codes.str.len()))
                .assign(codes=np.concatenate(new_df.codes.values))
                .reset_index(level=[0,1],drop=True)
                .sort_values(['ID','type','codes'])
                .reset_index()
                .reindex(columns = ['ID','date','codes','type']))

print(new_df)

Output

    ID       date codes type
0    1 2019-01-01     x    A
1    1 2019-01-02     x    A
2    1 2019-01-03     x    A
3    1 2019-01-04     x    A
4    1 2019-01-05     x    A
5    1 2019-01-01     y    A
6    1 2019-01-02     y    A
7    1 2019-01-03     y    A
8    1 2019-01-04     y    A
9    1 2019-01-05     y    A
10   2 2019-01-01     x    B
11   2 2019-01-02     x    B
12   2 2019-01-03     x    B
13   2 2019-01-04     x    B
14   2 2019-01-05     x    B
15   2 2019-01-01     y    B
16   2 2019-01-02     y    B
17   2 2019-01-03     y    B
18   2 2019-01-04     y    B
19   2 2019-01-05     y    B
20   2 2019-01-01     z    B
21   2 2019-01-02     z    B
22   2 2019-01-03     z    B
23   2 2019-01-04     z    B
24   2 2019-01-05     z    B

edited Jan 22, 2020 at 16:49

answered Jan 22, 2020 at 16:28

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

FranG91 Over a year ago

Do you know it there is a way to do it without using explode method? I have pandas < 0.25. Thank you!

ansev Over a year ago

you can see stackoverflow.com/questions/53218931/… . I am going to update the code

ansev Over a year ago

please check now:)

Collectives™ on Stack Overflow

Create DataFrame rows from date range unnesting field

1 Answer 1

Pandas > 0.25.0

Pandas < 0.25.0

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Pandas > 0.25.0

Pandas < 0.25.0

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related