Easier way to fill the missing fields csv using python Pandas

Question

I used the groupby method from pandas that can group by id and time in this example csv for example:

| id | month | average tree growth (cm)|
|----|-------|-------------------------|
|  1 |   4   |        9                |
|  1 |   5   |        4                |
|  1 |   6   |        7                |
|  2 |   1   |        9                |
|  2 |   2   |        9                |
|  2 |   3   |        8                |
|  2 |   4   |        6                |

However, each id should have 12 months and I will need to fill in the average tree height at that missing month to be null value, like this:

| id | month | average tree growth (cm)|
|----|-------|-------------------------|
|  1 |   1   |        nan              |
|  1 |   2   |        nan              |
|  1 |   3   |        nan              |
|  1 |   4   |        9                |
|  1 |   5   |        4                |
|  1 |   6   |        7                |
|  1 |   7   |        nan              |
|  1 |   8   |        nan              |
|  1 |   9   |        nan              |
|  1 |   10  |        nan              |
|  1 |   11  |        nan              |
|  1 |   12  |        nan              |
|  2 |   1   |        9                |

This is for bokeh plotting purpose, how do I add the missing month to each id and fill the average height to nan in this case using python? Is there any easier way than brute force looping all id and check for months? Any hint would be appreciated!

SeaBean · Accepted Answer · 2021-10-18 10:53:40Z

2

One way to do it is by creating MultiIndex and reindex by using pd.MultiIndex.from_product and .reindex(), as follows:

mux = pd.MultiIndex.from_product([df['id'].unique(), np.arange(1, 13)],
                                 names=['id', 'month'])

df.set_index(['id', 'month']).reindex(mux).reset_index()

Result:

    id  month  average tree growth (cm)
0    1      1                       NaN
1    1      2                       NaN
2    1      3                       NaN
3    1      4                       9.0
4    1      5                       4.0
5    1      6                       7.0
6    1      7                       NaN
7    1      8                       NaN
8    1      9                       NaN
9    1     10                       NaN
10   1     11                       NaN
11   1     12                       NaN
12   2      1                       9.0
13   2      2                       9.0
14   2      3                       8.0
15   2      4                       6.0
16   2      5                       NaN
17   2      6                       NaN
18   2      7                       NaN
19   2      8                       NaN
20   2      9                       NaN
21   2     10                       NaN
22   2     11                       NaN
23   2     12                       NaN

edited Oct 18, 2021 at 10:53

answered Oct 16, 2021 at 7:32

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

HappyDuppy Over a year ago

Thank you so much! But do you know why this error occurs? 'Series' object has no attribute 'stack'. I created a new csv after I group the data by id and month. When I read this new csv and apply the method you give, it raises the above error.

SeaBean Over a year ago

@YangZiqi Which one of the 2 above you tried ? Have you used unstack before stack as above ?

SeaBean Over a year ago

@YangZiqi Though I don't understand why you got the error using my solution (I tested it with your sample data without problem), I have provided another solution in my edit above. This solution should run very fast since it involves only simple step of on reindexing the row index only, without multiple steps in grouping or re-formatting the dataframe. Take a look.

HappyDuppy Over a year ago

Thank you, your solution for me looked very reasonable. I think I messed up the csv I read while using your method. It worked now. Thank you so much!

SeaBean Over a year ago

@YangZiqi Great that it works for you now. Make good use of my updated solution. As I mentioned, this solution is straightforward on achieving this specific task without unnecessary grouping or reformatting of the dataframe. Hence, it is more efficient.

Serge de Gosson de Varennes · Accepted Answer · 2021-10-16 05:05:33Z

1

One possible solution is the following:

(df.groupby('id')['month']
   .apply(lambda x:np.arange(1, 13))
   .explode()
   .reset_index()
   .merge(df, how='left')
   
)

which produces:

id month  average tree growth (cm)
0    1     1                       NaN
1    1     2                       NaN
2    1     3                       NaN
3    1     4                       9.0
4    1     5                       4.0
5    1     6                       7.0
6    1     7                       NaN
7    1     8                       NaN
8    1     9                       NaN
9    1    10                       NaN
10   1    11                       NaN
11   1    12                       NaN
12   2     1                       9.0
13   2     2                       9.0
14   2     3                       8.0
15   2     4                       6.0
16   2     5                       NaN
17   2     6                       NaN
18   2     7                       NaN
19   2     8                       NaN
20   2     9                       NaN
21   2    10                       NaN
22   2    11                       NaN
23   2    12                       NaN

answered Oct 16, 2021 at 5:05

Serge de Gosson de Varennes

11.6k4 gold badges30 silver badges60 bronze badges

1 Comment

Serge de Gosson de Varennes Over a year ago

Don't forget to mark the answer you accept as accepted. This we it disappears from the list of unanswered question.

Collectives™ on Stack Overflow

Easier way to fill the missing fields csv using python Pandas

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related