Replace NaN with empty list in a pandas dataframe

Question

I'm trying to replace some NaN values in my data with an empty list []. However the list is represented as a str and doesn't allow me to properly apply the len() function. is there anyway to replace a NaN value with an actual empty list in pandas?

In [28]: d = pd.DataFrame({'x' : [[1,2,3], [1,2], np.NaN, np.NaN], 'y' : [1,2,3,4]})

In [29]: d
Out[29]:
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2        NaN  3
3        NaN  4

In [32]: d.x.replace(np.NaN, '[]', inplace=True)

In [33]: d
Out[33]:
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2         []  3
3         []  4

In [34]: d.x.apply(len)
Out[34]:
0    3
1    2
2    2
3    2
Name: x, dtype: int64

EdChum · Accepted Answer · 2015-07-22 15:49:30Z

45

This works using isnull and loc to mask the series:

In [90]:
d.loc[d.isnull()] = d.loc[d.isnull()].apply(lambda x: [])
d

Out[90]:
0    [1, 2, 3]
1       [1, 2]
2           []
3           []
dtype: object

In [91]:
d.apply(len)

Out[91]:
0    3
1    2
2    0
3    0
dtype: int64

You have to do this using apply in order for the list object to not be interpreted as an array to assign back to the df which will try to align the shape back to the original series

EDIT

Using your updated sample the following works:

In [100]:
d.loc[d['x'].isnull(),['x']] = d.loc[d['x'].isnull(),'x'].apply(lambda x: [])
d

Out[100]:
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2         []  3
3         []  4

In [102]:    
d['x'].apply(len)

Out[102]:
0    3
1    2
2    0
3    0
Name: x, dtype: int64

edited Jul 22, 2015 at 15:49

answered Jul 22, 2015 at 15:18

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pranav nerurkar Over a year ago

what if we want to extend to the multiple columns of df

ieaves · Accepted Answer · 2022-02-03 18:06:00Z

12

To extend the accepted answer, apply calls can be particularly expensive - the same task can be accomplished without it by constructing a numpy array from scratch.

isna = df['x'].isna()
df.loc[isna, 'x'] = pd.Series([[]] * isna.sum()).values

A quick timing comparison:

def empty_assign_1(s):
    s[s.isna()].apply(lambda x: [])

def empty_assign_2(s):
    [[]] * s.isna().sum()

series = pd.Series(np.random.choice([1, 2, np.nan], 1000000))

%timeit empty_assign_1(series)
>>> 61 ms ± 964 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit empty_assign_2(series)
>>> 2.17 ms ± 70.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Nearly 10 times faster!

EDIT: Fixed a bug pointed out by @valentin

You have to be somewhat careful with data types when performing assignment in this case. In the example above, the test series is float, however, adding [] elements coerces the entire series to object. Pandas will handle that for you if you do something like

idx = series.isna()
series[isna] = series[isna].apply(lambda x: [])

Because the output of apply is itself a series. You can test live performance with assignment overhead like so (I've added a string value so the series with be an object, you could instead use a number as the replacement value rather than an empty list to avoid coercion).

def empty_assign_1(s):
    idx = s.isna()
    s[idx] = s[idx].apply(lambda x: [])

def empty_assign_2(s):
    idx = s.isna()
    s.loc[idx] = [[]] * idx.sum()

series = pd.Series(np.random.choice([1, 2, np.nan, '2'], 1000000))

%timeit empty_assign_1(series.copy())
>>> 45.1 ms ± 386 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit empty_assign_2(series.copy())
>>> 24 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

About 4 ms of that is related to the copy, 10x to 2x, still pretty great.

edited Feb 3, 2022 at 18:06

answered May 21, 2020 at 21:30

ieaves

1291 silver badge5 bronze badges

2 Comments

valentin Over a year ago

This answer is misleading since the implementation of the first function empty_assign_1() seems incorrect. It applies the lambda function on every element in the series instead of only on those where the value is actually NaN. It should be s[s.isna()].apply(...). Performing the timing comparison after this fix actually reverses the results so that the first function becomes faster.

ieaves Over a year ago

Hah! You actually did catch an error, I seem to have forgotten that isna is not the reciprocal of dropna. That being said, the original post is still correct. The reason you're observing a reversal is because of the unnecessary constructor call to pd.Series (which is also quite slow). Just use [[]]*s.isna().sum() and you'll be back in business. The context of this specific question is complicated by replacing nans with a list because of the way pandas interprets list inputs so you'll need to create series with dtype='object' and .loc for assignment (or replace with a non list).

icpp-pro · Accepted Answer · 2020-08-04 13:13:14Z

9

You can also use a list comprehension for this:

d['x'] = [ [] if x is np.NaN else x for x in d['x'] ]

answered Aug 4, 2020 at 13:13

icpp-pro

1,48618 silver badges22 bronze badges

Comments

Ramineni Ravi Teja · Accepted Answer · 2023-09-08 10:42:21Z

0

import pandas as pd
import numpy as np

data = {'column1': [[1, 2], [2, 3], np.nan, [4, 5], np.nan],
        'column2': [np.nan, "Hi", "Hello", np.nan, "H"]}

df = pd.DataFrame(data)

def replace_none_with_empty_list(x):
    if x is np.nan:
        return []
    else:
        return x

df = df.applymap(replace_none_with_empty_list)

print(df)

wherever NaN is there, this will remove with empty array.else retuns the same value

 column1 column2
0  [1, 2]      []
1  [2, 3]      Hi
2      []   Hello
3  [4, 5]      []
4      []       H

answered Sep 8, 2023 at 10:42

Ramineni Ravi Teja

3,99631 silver badges41 bronze badges

Collectives™ on Stack Overflow

Replace NaN with empty list in a pandas dataframe

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related