73

I have a dataFrame in pandas and several of the columns have all null values. Is there a built in function which will let me remove those columns?

1
  • 2
    could you maybe accept the answer? This will mark the question as resolved and help other users as well. Commented Nov 1, 2016 at 9:16

4 Answers 4

126

Yes, dropna. See http://pandas.pydata.org/pandas-docs/stable/missing_data.html and the DataFrame.dropna docstring:

Definition: DataFrame.dropna(self, axis=0, how='any', thresh=None, subset=None)
Docstring:
Return object with labels on given axis omitted where alternately any
or all of the data are missing

Parameters
----------
axis : {0, 1}
how : {'any', 'all'}
    any : if any NA values are present, drop that label
    all : if all values are NA, drop that label
thresh : int, default None
    int value : require that many non-NA values
subset : array-like
    Labels along other axis to consider, e.g. if you are dropping rows
    these would be a list of columns to include

Returns
-------
dropped : DataFrame

The specific command to run would be:

df=df.dropna(axis=1,how='all')
Sign up to request clarification or add additional context in comments.

4 Comments

can you specify the 'dropna' value? for example could you drop rows that are all zeros?
you could either define with the pandas io parsers that your NaN value in given input tabels is 0, OR, you could prepare your step like this: df[df==0] = np.nan ; df=df.dropna(axis=1,how='all')
For inplace: df.dropna(axis=1,how='all',inplace=True)
I used df=df.dropna(axis=1,how='all') but it removed all my df columns. Other columns are not entirely empty.
3

Another solution would be to create a boolean dataframe with True values at not-null positions and then take the columns having at least one True value. This removes columns with all NaN values.

df = df.loc[:,df.notna().any(axis=0)]

If you want to remove columns having at least one missing (NaN) value;

df = df.loc[:,df.notna().all(axis=0)]

This approach is particularly useful in removing columns containing empty strings, zeros or basically any given value. For example;

df = df.loc[:,(df!='').all(axis=0)]

removes columns having at least one empty string.

Comments

0

Here is a simple function which you can use directly by passing dataframe and threshold

df
'''
     pets   location     owner     id
0     cat  San_Diego     Champ  123.0
1     dog        NaN       Ron    NaN
2     cat        NaN     Brick    NaN
3  monkey        NaN     Champ    NaN
4  monkey        NaN  Veronica    NaN
5     dog        NaN      John    NaN
'''

def rmissingvaluecol(dff,threshold):
    l = []
    l = list(dff.drop(dff.loc[:,list((100*(dff.isnull().sum()/len(dff.index))>=threshold))].columns, 1).columns.values)
    print("# Columns having more than %s percent missing values:"%threshold,(dff.shape[1] - len(l)))
    print("Columns:\n",list(set(list((dff.columns.values))) - set(l)))
    return l


rmissingvaluecol(df,1) #Here threshold is 1% which means we are going to drop columns having more than 1% of missing values

#output
'''
# Columns having more than 1 percent missing values: 2
Columns:
 ['id', 'location']
'''

Now create new dataframe excluding these columns

l = rmissingvaluecol(df,1)
df1 = df[l]

PS: You can change threshold as per your requirement

Bonus step

You can find the percentage of missing values for each column (optional)

def missing(dff):
    print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False))

missing(df)

#output
'''
id          83.33
location    83.33
owner        0.00
pets         0.00
dtype: float64
'''

1 Comment

This answer is inferior to df.dropna(..., thresh) implements this, we just need to calculate the right value. And you don't need to create any new dataframe, you just do df.dropna(..., inplace=True).
-2

Function for removing all null columns from the data frame:

def Remove_Null_Columns(df):
    dff = pd.DataFrame()
    for cl in fbinst:
        if df[cl].isnull().sum() == len(df[cl]):
            pass
        else:
            dff[cl] = df[cl]
    return dff 

This function will remove all Null columns from the df.

1 Comment

Please, if you answer something, atleast use a correct guidestyle like pep8... Also, pandas offers the dropna() function, so this is not a good answer...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.