I have a dataframe generated by this code
lcust = [1, 1, 1, 1, 2, 2, 3, 3, 3, 3]
lmonth = [3, 4, 5, 9, 3, 5, 99, 101, 102, 105]
lval1 = np.random.randint(2, 100, len(lmonth)).tolist()
lval2 = np.random.rand(len(lmonth)).tolist()
index_ = pd.MultiIndex.from_arrays([lcust, lmonth], names=('number','month'))
df_ = pd.DataFrame(data=np.array([lval1, lval2]).T, columns = ['val1', 'val2'], index = index_)
It looks as follows:
val1 val2
number month
1 3 8.0 0.306048
4 45.0 0.151272
5 91.0 0.695793
9 50.0 0.927028
2 3 68.0 0.925622
5 49.0 0.402069
3 99 58.0 0.704662
101 93.0 0.759338
102 10.0 0.555434
105 39.0 0.030003
My question is whether there is a convenient way to get it to look like this:
val1_y val2_y
number month
1 3 8.0 0.306048
4 45.0 0.151272
5 91.0 0.695793
6 0.0 0.000000
7 0.0 0.000000
8 0.0 0.000000
9 50.0 0.927028
2 3 68.0 0.925622
4 0.0 0.000000
5 49.0 0.402069
3 99 58.0 0.704662
100 0.0 0.000000
101 93.0 0.759338
102 10.0 0.555434
103 0.0 0.000000
104 0.0 0.000000
105 39.0 0.030003
That is, I am looking for some code to fill out the missing months. In my database these values are just mmissing, but in actuality they should be zero and I need them for further calculations.You can think of number being a customer ID and month is the number of month the customer is a member. val1 and val2 are some values of interest.
Please let me know in case you need further information.
Many thanks c
