Adding new rows with new values at some specific columns in pandas

Question

Assume we have a table looks like the following:

id	week_num	people	date	level	a	b
1	1	20	1990101	1	2	3
1	2	30	1990108	1	2	3
1	3	40	1990115	1	2	3
1	5	100	1990129	1	2	3
1	7	100	1990212	1	2	3

week_num skip the "4" and "6" because the corresponding "people" is 0. However, we want the all the rows included like the following table.

id	week_num	people	date	level	a	b
1	1	20	1990101	1	2	3
1	2	30	1990108	1	2	3
1	3	40	1990115	1	2	3
1	4	0	1990122	1	2	3
1	5	100	1990129	1	2	3
1	6	0	1990205	1	2	3
1	7	100	1990212	1	2	3

The date starts with 1990101, the next row must +7 days if it is a continuous week_num(Ex: 1,2 is continuous; 1,3 is not).

How can we use python(pandas) to achieve this goal?

Note: Each id has 10 week_num(1,2,3,...,10), the output must include all "week_num" with corresponding "people" and "date".

Update: Other columns like "level","a","b" should stay the same even we add the skipped week_num.

mcsoini · Accepted Answer · 2022-04-18 10:30:01Z

1

This assumes that the date restarts at 1990-01-01 for each id:

import itertools
# reindex to get all combinations of ids and week numbers
df_full = (df.set_index(["id", "week_num"])
             .reindex(list(itertools.product([1,2], range(1, 11))))
             .reset_index())
# fill people with zero
df_full = df_full.fillna({"people": 0})
# forward fill some other columns
cols_ffill = ["level", "a", "b"]
df_full[cols_ffill] = df_full[cols_ffill].ffill()
# reconstruct date from week starting from 1990-01-01 for each id
df_full["date"] = pd.to_datetime("1990-01-01") + (df_full.week_num - 1) * pd.Timedelta("1w")
df_full

# out:
    id  week_num  people       date  level    a    b
0    1         1    20.0 1990-01-01    1.0  2.0  3.0
1    1         2    30.0 1990-01-08    1.0  2.0  3.0
2    1         3    40.0 1990-01-15    1.0  2.0  3.0
3    1         4     0.0 1990-01-22    1.0  2.0  3.0
4    1         5   100.0 1990-01-29    1.0  2.0  3.0
5    1         6     0.0 1990-02-05    1.0  2.0  3.0
6    1         7   100.0 1990-02-12    1.0  2.0  3.0
7    1         8     0.0 1990-02-19    1.0  2.0  3.0
8    1         9     0.0 1990-02-26    1.0  2.0  3.0
9    1        10     0.0 1990-03-05    1.0  2.0  3.0
10   2         1     0.0 1990-01-01    1.0  2.0  3.0
11   2         2     0.0 1990-01-08    1.0  2.0  3.0
12   2         3     0.0 1990-01-15    1.0  2.0  3.0
13   2         4     0.0 1990-01-22    1.0  2.0  3.0
14   2         5     0.0 1990-01-29    1.0  2.0  3.0
15   2         6     0.0 1990-02-05    1.0  2.0  3.0
16   2         7     0.0 1990-02-12    1.0  2.0  3.0
17   2         8     0.0 1990-02-19    1.0  2.0  3.0
18   2         9     0.0 1990-02-26    1.0  2.0  3.0
19   2        10     0.0 1990-03-05    1.0  2.0  3.0

edited Apr 18, 2022 at 10:30

answered Apr 18, 2022 at 8:35

mcsoini

6,7922 gold badges21 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

KT.Thompson Over a year ago

What if other columns(such as: "a" ,"level") need stay the same for each id? It seems not repeat the values of other columns.

mcsoini Over a year ago

Unclear, please update question

KT.Thompson Over a year ago

I just update the question and add some new comments. Thanks

mcsoini Over a year ago

see update. you can use ffill on the other columns

KT.Thompson Over a year ago

Another question: It seems won't work out if "id" is something like "A123F"

|

Collectives™ on Stack Overflow

Adding new rows with new values at some specific columns in pandas

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related