45

I have a data frame with some columns with empty lists and others with lists of strings:

       donation_orgs                              donation_context
0            []                                           []
1   [the research of Dr. ...]   [In lieu of flowers , memorial donations ...]

I'm trying to return a data set without any of the rows where there are empty lists.

I've tried just checking for null values:

dfnotnull = df[df.donation_orgs != []]
dfnotnull

and

dfnotnull = df[df.notnull().any(axis=1)]
pd.options.display.max_rows=500
dfnotnull

And I've tried looping through and checking for values that exist, but I think the lists aren't returning Null or None like I thought they would:

dfnotnull = pd.DataFrame(columns=('donation_orgs', 'donation_context'))
for i in range(0,len(df)):
    if df['donation_orgs'].iloc(i):
        dfnotnull.loc[i] = df.iloc[i]

All three of the above methods simply return every row in the original data frame.=

1
  • 1
    In my experience it is quite perilous to keep data in lists within data frames. It can make grouping and aggregation functions go wrong. If you must do it, consider the tuple instead, that seems to work better. Commented Dec 8, 2015 at 18:50

5 Answers 5

86

To avoid converting to str and actually use the lists, you can do this:

df[df['donation_orgs'].map(len) > 0]

It maps the donation_orgs column to the length of the lists of each row and keeps only the ones that have at least one element, filtering out empty lists.

It returns

Out[1]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]

as expected.

Sign up to request clarification or add additional context in comments.

3 Comments

this should be the accepted answer . Its more elegant
df[df['donation_orgs'].map(len) > 0], or even df[df['donation_orgs'].map(bool)]
df[df['donation_orgs'].map(bool)] this works best as this can even handle null values
38

You could try slicing as though the data frame were strings instead of lists:

import pandas as pd
df = pd.DataFrame({
'donation_orgs' : [[], ['the research of Dr.']],
'donation_context': [[], ['In lieu of flowers , memorial donations']]})

df[df.astype(str)['donation_orgs'] != '[]']

Out[9]: 
                            donation_context          donation_orgs
1  [In lieu of flowers , memorial donations]  [the research of Dr.]

Comments

12

You can use the following one-liner:

df[(df['donation_orgs'].str.len() != 0) | (df['donation_context'].str.len() != 0)]

Comments

5

Assuming that you read data from a CSV, the other possible solution could be this:

import pandas as pd

df = pd.read_csv('data.csv', na_filter=True, na_values='[]')
df.dropna()

na_filter defines additional string to recognize as NaN. I tested this on pandas-0.24.2.

Comments

1

It's probably that the data type is different, This will help probably

df[df.astype(str)['donation_orgs'] != '[]']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.