I have a dataframe with real estate object parameters ('Rooms', 'Square' etc). I want to replace rows where 'Rooms' parameter equals to 0 to the corresponding number of rooms from a ('Rooms' - 'mean Square') dataframe I created from the same initial dataset.
I would use the .replace method, but the problem is - the actual Square values for 0 Room rows don't exactly match the mean values.
I'm new to pandas, so all solutions I would try are based on putting the column values into python lists and using cycles, which is a nightmare. All other similar topics I've seen on stackoverflow are good only for exact matches.
This is the slice and part of the initial dataframe where I want the values ('Rooms') to be changed:
data.loc[data['Rooms'] == 0][['Rooms', 'Square']]
Rooms Square
1397 0.0 138.427694
1981 0.0 212.932361
2269 0.0 41.790881
3911 0.0 49.483501
4366 0.0 81.491446
4853 0.0 2.377248
6149 0.0 38.697117
8834 0.0 87.762616
This is the code that creates the 'Rooms' - 'mean Square' dataframe:
mean_square = data.loc[(data['Rooms'] < 6) & (data['Rooms'] > 0)].groupby('Rooms', as_index=False)['Square'].mean()
This is the result:
Rooms Square
0 1.0 41.323277
1 2.0 56.788214
2 3.0 76.903234
3 4.0 98.377544
4 5.0 122.614941
For example, for item 1397 I would expect 0.0 changed to 5.0 (~138 sqm is closest to ~122 sq mean for 4 rooms).