1

I have a dataframe like this:

df = pd.DataFrame([[1,2,np.nan,np.nan,5],[3,4,np.nan,np.nan,6]],columns=['a','b','c','Unnamed: 4','Unnamed: 5'])

df
Out[16]: 
   a  b   c  Unnamed: 4  Unnamed: 5
0  1  2 NaN         NaN           5
1  3  4 NaN         NaN           6

I want to drop columns that are BOTH all nan AND have 'Unnamed: ' in the name (as often happens when importing a dataframe from a file with columns that have no name in the header). Desired output:

   a  b   c  Unnamed: 5
0  1  2 NaN           5
1  3  4 NaN           6

I can do:

df[[col for col in df.columns if 'Unnamed: ' not in col]]
Out[18]: 
   a  b   c
0  1  2 NaN
1  3  4 NaN

or:

df.dropna(how='all',axis=1)

Out[19]: 
   a  b  Unnamed: 5
0  1  2           5
1  3  4           6

Is there a pythonic way to do both these things simultaneously (connected by AND not OR)?

1 Answer 1

3

filter + isnull + drop

First filter your dataframe for column labels, then calculate which are all null:

nulls = df.filter(like='Unnamed').isnull().all()

df = df.drop(nulls[nulls].index, axis='columns')

print(df)

   a  b   c  Unnamed: 5
0  1  2 NaN           5
1  3  4 NaN           6
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.