1

Assume we have a table looks like the following:

id week_num people date level a b
1 1 20 1990101 1 2 3
1 2 30 1990108 1 2 3
1 3 40 1990115 1 2 3
1 5 100 1990129 1 2 3
1 7 100 1990212 1 2 3

week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.

id week_num people date level a b
1 1 20 1990101 1 2 3
1 2 30 1990108 1 2 3
1 3 40 1990115 1 2 3
1 4 0 1990122 1 2 3
1 5 100 1990129 1 2 3
1 6 0 1990205 1 2 3
1 7 100 1990212 1 2 3

The date starts with 1990101, the next row must +7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).

How can we use python(pandas) to achieve this goal?

Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".

Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.

1 Answer 1

1

This assumes that the date restarts at 1990-01-01 for each id:

import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
             .reindex(list(itertools.product([1,2], range(1, 11))))
             .reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01") + (df_full.week_num - 1) * pd.Timedelta("1w")
df_full

# out:
    id  week_num  people       date  level    a    b
0    1         1    20.0 1990-01-01    1.0  2.0  3.0
1    1         2    30.0 1990-01-08    1.0  2.0  3.0
2    1         3    40.0 1990-01-15    1.0  2.0  3.0
3    1         4     0.0 1990-01-22    1.0  2.0  3.0
4    1         5   100.0 1990-01-29    1.0  2.0  3.0
5    1         6     0.0 1990-02-05    1.0  2.0  3.0
6    1         7   100.0 1990-02-12    1.0  2.0  3.0
7    1         8     0.0 1990-02-19    1.0  2.0  3.0
8    1         9     0.0 1990-02-26    1.0  2.0  3.0
9    1        10     0.0 1990-03-05    1.0  2.0  3.0
10   2         1     0.0 1990-01-01    1.0  2.0  3.0
11   2         2     0.0 1990-01-08    1.0  2.0  3.0
12   2         3     0.0 1990-01-15    1.0  2.0  3.0
13   2         4     0.0 1990-01-22    1.0  2.0  3.0
14   2         5     0.0 1990-01-29    1.0  2.0  3.0
15   2         6     0.0 1990-02-05    1.0  2.0  3.0
16   2         7     0.0 1990-02-12    1.0  2.0  3.0
17   2         8     0.0 1990-02-19    1.0  2.0  3.0
18   2         9     0.0 1990-02-26    1.0  2.0  3.0
19   2        10     0.0 1990-03-05    1.0  2.0  3.0
Sign up to request clarification or add additional context in comments.

6 Comments

What if other columns(such as: "a" ,"level") need stay the same for each id? It seems not repeat the values of other columns.
Unclear, please update question
I just update the question and add some new comments. Thanks
see update. you can use ffill on the other columns
Another question: It seems won't work out if "id" is something like "A123F"
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.