8

I have a pandas dataframe like:

df = pd.DataFrame({'Last_Name': ['Smith', None, 'Brown'], 
                   'First_Name': ['John', None, 'Bill'],
                   'Age': [35, 45, None]})

And could manually filter it using:

df[df.Last_Name.isnull() & df.First_Name.isnull()]

but this is annoying as I need to write a lot of duplicate code for each column/condition. It is not maintainable if there is a large number of columns. Is it possible to write a function which generates this python code for me?

Some Background: My pandas dataframe is based on an initial SQL-based multi-dimensional Aggregation (grouping-sets) https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-multi-dimensional-aggregation.html so always some different columns are NULL. Now, I want to efficiently select these different groups and analyze them separately in pandas.

1 Answer 1

16

Use filter:

df[df.filter(like='_Name').isna().all(1)]

  Last_Name First_Name   Age
1      None       None  45.0

Or, if you want more flexibility, specify a list of column names.

cols = ['First_Name', 'Last_Name']
df[df[cols].isna().all(1)]

  Last_Name First_Name   Age
1      None       None  45.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.