8

I have a pandas dataframe with 10 columns and I want to fill missing values for all columns except one (lets say that column is called test). Currently, if I do this:

df.fillna(df.median(), inplace=True)

It replaces NA values in all columns with median value, how do I exclude specific column(s) without specifying ALL the other columns

2 Answers 2

10

you can use pd.DataFrame.drop to help out

df.drop('unwanted_column', 1).fillna(df.median())

Or pd.Index.difference

df.loc[:, df.columns.difference(['unwanted_column'])].fillna(df.median())

Or just

df.loc[:, df.columns != 'unwanted_column']

Input to difference function should be passed as an array (Edited).

Sign up to request clarification or add additional context in comments.

Comments

2

Just select whatever columns you want using pandas' column indexing:

>>> import numpy as np
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [np.nan, 5, 2, np.nan, 3], 'B': [np.nan, 4, 3, 5, np.nan], 'C': [np.nan, 4, 3, 2, 1]})
>>> df
     A    B    C
0  NaN  NaN  NaN
1  5.0  4.0  4.0
2  2.0  3.0  3.0
3  NaN  5.0  2.0
4  3.0  NaN  1.0
>>> cols = ['A', 'B']
>>> df[cols] = df[cols].fillna(df[cols].median())
>>> df
     A    B    C
0  3.0  4.0  NaN
1  5.0  4.0  4.0
2  2.0  3.0  3.0
3  3.0  5.0  2.0
4  3.0  4.0  1.0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.