1

I have the following Pandas DataFrame:

ID       start_date        end_date        codes         type
1        2019-01-01       2019-01-05      [x, y]          A
2        2019-01-01       2019-01-05      [x, y, z]       B

What I want to do is to generate the same number of rows as the range between two dates, for each code. The output will be this:

ID          date              codes        type
1        2019-01-01            x            A
1        2019-01-02            x            A
1        2019-01-03            x            A
1        2019-01-04            x            A
1        2019-01-05            x            A
1        2019-01-01            y            A
1        2019-01-02            y            A
1        2019-01-03            y            A
1        2019-01-04            y            A
1        2019-01-05            y            A
2        2019-01-01            x            B
2        2019-01-02            x            B
.....

Thank you very much!

1 Answer 1

2

Pandas > 0.25.0

#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
            .set_index('date')
            .groupby(['ID','type'])
            .resample('D').ffill()
            .drop(columns = 'variable')
            .explode('codes')
            .reset_index(level=[0,1],drop=True)
            .sort_values(['ID','type','codes'])
            .reset_index()
            .reindex(columns = ['ID','date','codes','type'])
         )
print(new_df)

Pandas < 0.25.0

#if necessary
#df['start_date']= pd.to_datetime(df['start_date'])
#df['end_date']= pd.to_datetime(df['end_date'])
new_df = (df.melt(['ID','type','codes'],value_name = 'date')
            .set_index('date')
            .groupby(['ID','type'])
            .resample('D').ffill()
            .drop(columns = 'variable'))

new_df = (new_df.reindex(new_df.index.repeat(new_df.codes.str.len()))
                .assign(codes=np.concatenate(new_df.codes.values))
                .reset_index(level=[0,1],drop=True)
                .sort_values(['ID','type','codes'])
                .reset_index()
                .reindex(columns = ['ID','date','codes','type']))

print(new_df)

Output

    ID       date codes type
0    1 2019-01-01     x    A
1    1 2019-01-02     x    A
2    1 2019-01-03     x    A
3    1 2019-01-04     x    A
4    1 2019-01-05     x    A
5    1 2019-01-01     y    A
6    1 2019-01-02     y    A
7    1 2019-01-03     y    A
8    1 2019-01-04     y    A
9    1 2019-01-05     y    A
10   2 2019-01-01     x    B
11   2 2019-01-02     x    B
12   2 2019-01-03     x    B
13   2 2019-01-04     x    B
14   2 2019-01-05     x    B
15   2 2019-01-01     y    B
16   2 2019-01-02     y    B
17   2 2019-01-03     y    B
18   2 2019-01-04     y    B
19   2 2019-01-05     y    B
20   2 2019-01-01     z    B
21   2 2019-01-02     z    B
22   2 2019-01-03     z    B
23   2 2019-01-04     z    B
24   2 2019-01-05     z    B
Sign up to request clarification or add additional context in comments.

3 Comments

Do you know it there is a way to do it without using explode method? I have pandas < 0.25. Thank you!
you can see stackoverflow.com/questions/53218931/… . I am going to update the code
please check now:)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.