4

I know that similar questions have been asked before, but I literarily tried every possible solution listed here and none of them worked.

I am having a dataframe which consists of dates, strings, empty values, and empty list values. It is very huge, 8 million rows.

I want to replace all of the empty list values - so only cells that contain only [], nothing else with NaN. Nothing seems to work.

I tried this:

df = df.apply(lambda y: np.nan if (type(y) == list and len(y) == 0) else y)

as advised similarly in this question replace empty list with NaN in pandas dataframe but it doesn't change anything in my dataframe.

Any help would be appreciated.

2
  • I think maybe it's not a problem of your code. You may check the real data type of your columns. Maybe it's default to object. Commented May 4, 2017 at 17:09
  • Are you empty lists strings '[]' or actual empty lists? Commented May 4, 2017 at 17:19

2 Answers 2

13

Just to assume the OP wants to convert empty list, the string '[]' and the object '[]' to na, below is a solution.

Setup

#borrowed from piRSquared's answer.
df = pd.DataFrame([
        [1, 'hello', np.nan, None, 3.14],
        ['2017-06-30', 2, 'a', 'b', []],
        [pd.to_datetime('2016-08-14'), 'x', '[]', 'z', 'w']
    ])

df
Out[1062]: 
                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b    []
2  2016-08-14 00:00:00      x   []     z     w

Solution:

#convert all elements to string first, and then compare with '[]'. Finally use mask function to mark '[]' as na
df.mask(df.applymap(str).eq('[]'))
Out[1063]: 
                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b   NaN
2  2016-08-14 00:00:00      x  NaN     z     w
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, yes I wanted to convert all of them to NaN's. Do you have any advice for performance? It is slow on 8 million rows. Could it be improved?
3

I'm going to make the assumption that you want to mask actual empty lists.

  • pd.DataFrame.mask will turn cells that have corresponding True values to np.nan
  • I want to find actual list values. So I'll use df.applymap(type) to get the type in every cell and see if it is equal to list
  • I know that [] evaluates to False in a boolean context, so I'll use df.astype(bool) to see.
  • I'll end up masking those cells that are both list type and evaluate to False

Consider the dataframe df

df = pd.DataFrame([
        [1, 'hello', np.nan, None, 3.14],
        ['2017-06-30', 2, 'a', 'b', []],
        [pd.to_datetime('2016-08-14'), 'x', '[]', 'z', 'w']
    ])

df

                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b    []
2  2016-08-14 00:00:00      x   []     z     w

Solution

df.mask(df.applymap(type).eq(list) & ~df.astype(bool))

                     0      1    2     3     4
0                    1  hello  NaN  None  3.14
1           2017-06-30      2    a     b   NaN
2  2016-08-14 00:00:00      x   []     z     w

1 Comment

This is great, it works, but @Allen was right, I need to convert all of them into NaN's so I will mark his answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.