Suppose I have this kind of Dataframe:
Data: Lat Long Postal Code
0 41 32 01556
1 32 31 01023
2 31 33 01023
3 NaN NaN 01023
4 33 42 01775
5 40 44 01999
As you can see, rows 1,2,3 have the same postal code. So, in order to fill the NaNs, it would be nice to just use the average of those 2 rows (1,2). How can I generalize this for a large dataset?
- For each row with NaN data in Lat/Long,
- Find other rows with the same postal code
- then compute the mean
- and use it to replace the NaNs