Suppose we have a dataframe with following columns 'Age', 'Name', 'Sex', where 'Age' and 'Sex' contain missing values. I want to drop all columns with missing values except one column 'Age'. So that I have a df with 2 columns 'Name' and 'Age'. How can I do it ?
1 Answer
This should do what you need:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Age' : [5,np.nan,12,43],
'Name' : ['Alice','Bob','Charly','Dan'],
'Sex' : ['F','M','M',np.nan]})
df_filt = df.loc[:,(-df.isnull().any()) | (df.columns.isin(['Age']))]
Explanation:
df.isnull().any()) checks for all columns if any value is None or NaN, the - means that only those columns are selected that do not meet that criterion.
df.columns.isin(['Age']) checks for all columns if their name is 'Age', so that this column is selected in any case.
Both conditions are connected by an OR (|) so that if either condition applies the column is selected.