0

I have a DataFrame which has columns name age,salary. There are some NaN values too. I want to fill those values using Mean and Median.

Original DataFrame


age salary
0   20.0    NaN
1   45.0    22323.0
2   NaN 598454.0
3   32.0    NaN
4   NaN 48454.0

Fill missing age with the mean() and salary with median() of their respective columns using apply().

I used

df['age','salary'].apply({'age':lambda row:row.fillna(row.mean()), 'salary':lambda row:row.fillna(row.median()) })

It is showing Key error 'age','salary' even after I use axis=1

Ecpected Output

    age salary
0   20.000000   48454.0
1   45.000000   22323.0
2   32.333333   598454.0
3   32.000000   48454.0
4   32.333333   48454.0

Can someone show me how to do it properly and what is happening in the background?

Please tell if there are other ways too. I am learning Pandas from scratch

2
  • hey Deshwal, can you post an example of your data and expected output? Commented Sep 28, 2019 at 10:40
  • @Datanovice Sure. I have updated. Please take a look Commented Sep 28, 2019 at 11:09

2 Answers 2

1

How about computing the missing values before running apply? That is, compute the mean of age and the median of salary then use (note the extra [] brackets needed to operate on multiple columns)

median_salary = df['salary'].median()
mean_age = df['age'].mean()

df[['age','salary']].apply({'age': lambda r: r.fillna(mean_age), 'salary': lambda r: r.fillna(median_salary)}) 

Also note that this does not affect the dataframe but instead creates a new one so if you want to update the columns use something like:

df[['age', 'salary']] = df[['age', 'salary']].apply(...)

Or, in your case where you just want to fill in missing values, the best solution is probably:

r.fillna({'age': mean_age, 'salary': median_salary}, inplace=True)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks man! It was some useful knowledge that you just provided.
1

According to the documentation, the easiest way to do that you ask is to pass a dictionary as a value parameter:

value : scalar, dict, Series, or DataFrame

Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list.

in your case the code will be next:

df.fillna(value={'age': df.age.mean(), 'salary': df.salary.median()}, inplace=True)

and gives:

         age    salary
0  20.000000   48454.0
1  32.333333   22323.0
2  45.000000  598454.0
3  32.333333   48454.0
4  32.000000   48454.0
5  32.333333   48454.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.