How to fill dataframe Nan values with empty list [] in pandas?

Question

This is my dataframe:

          date                          ids
0     2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
1     2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
2     2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
3     2011-04-26  Nan
4     2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...
5     2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,...

I want to replace Nan with []. How to do that? .fillna([]) did not work. I even tried replace(np.nan, []) but it gives error:

 TypeError('Invalid "to_replace" type: \'float\'',)

Empty list cannot be assigned, would df.ix[df['ids'].isnull(), 'ids'] = set() set work? — Zero
– Zero, Commented Oct 18, 2015 at 14:38
Note that one reason this is so hard is because you're not really meant to store nonscalar values in dataframe cells. You can do it, and it's sometimes handy as an intermediate step (there are a number of built-in methods which generate lists as elements), but there's not strong support for it yet. — DSM
– DSM, Commented Oct 18, 2015 at 17:03
Interestingly, I managed to run an infinite loop (reaching RecursionError) using: df.ids.where(df.ids.isnull(), [[]]). — PlasmaBinturong
– PlasmaBinturong, Commented Oct 30, 2019 at 18:10

Nick Edgar · Accepted Answer · 2017-05-10 18:02:08Z

77

My approach is similar to @hellpanderrr's, but instead tests for list-ness rather than using isnan:

df['ids'] = df['ids'].apply(lambda d: d if isinstance(d, list) else [])

I originally tried using pd.isnull (or pd.notnull) but, when given a list, that returns the null-ness of each element.

answered May 10, 2017 at 18:02

Nick Edgar

1,3881 gold badge10 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

John Sandall Over a year ago

If you need to do it across a whole dataframe, this worked for me: df = df.applymap(lambda d: d if isinstance(d, list) else [])

Free Palestine May 12 at 13:07

Clever solution! Thank you

ronkov · Accepted Answer · 2021-12-14 16:15:20Z

53

A simple solution would be:

df['ids'].fillna("").apply(list)

As noted by @timgeb, this requires df['ids'] to contain lists or nan only.

edited Dec 14, 2021 at 16:15

answered Oct 5, 2020 at 11:36

ronkov

1,65312 silver badges20 bronze badges

5 Comments

timgeb Over a year ago

Cool! Note that this requires df['ids'] to contain lists only, apart from missing values (this is the case in OP's example).

Memin Over a year ago

I have tested the @Nick Edgar method with yours. Yours is almost 2x faster. Thanks...

Thomas LESIEUR Over a year ago

Thanks for the solution. Do you know how to do it for a empty list of 4 elements ?

ronkov Over a year ago

@ThomasLESIEUR you could try to do a .replace{"": whatever}) instead of the .apply(list)

Walter Tross Over a year ago

For anyone wondering how this works: list('') == [] because list('abc') == ['a', 'b', 'c']

PlasmaBinturong · Accepted Answer · 2017-03-22 17:36:39Z

42

After a lot of head-scratching I found this method that should be the most efficient (no looping, no apply), just assigning to a slice:

isnull = df.ids.isnull()

df.loc[isnull, 'ids'] = [ [[]] * isnull.sum() ]

The trick was to construct your list of [] of the right size (isnull.sum()), and then enclose it in a list: the value you are assigning is a 2D array (1 column, isnull.sum() rows) containing empty lists as elements.

answered Mar 22, 2017 at 17:36

PlasmaBinturong

2,31424 silver badges27 bronze badges

4 Comments

HaPsantran Over a year ago

This is the most efficient answer.

timgeb Over a year ago

Note that [[]] * isnull.sum() does not create isnull.sum() amount of empty lists, it is creating exactly one empty list with multiple references.

Khris Over a year ago

For some reason that didn't work for me, but a simple df.loc[isnull, 'ids'] = [[]] does the trick. Might have changed with newer pandas versions.

low_ghost Over a year ago

I like this answer a lot due to avoiding the potentially costly apply, but I get the error 'must have equal len keys and value when setting with an ndarray'. Simply doing [[]] as @Khris suggests gives me the same error. However, stackoverflow.com/a/61944174/4345899 seems to work, so isna = df[col].isna(); df.loc[isna, [col]] = pd.Series([[]] * isna.sum()).values in pandas==1.2.2

Alexander · Accepted Answer · 2015-10-18 17:00:03Z

26

You can first use loc to locate all rows that have a nan in the ids column, and then loop through these rows using at to set their values to an empty list:

for row in df.loc[df.ids.isnull(), 'ids'].index:
    df.at[row, 'ids'] = []

>>> df
        date                                             ids
0 2011-04-23  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
1 2011-04-24  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
2 2011-04-25  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
3 2011-04-26                                              []
4 2011-04-27  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
5 2011-04-28  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

answered Oct 18, 2015 at 17:00

Alexander

111k32 gold badges212 silver badges208 bronze badges

Comments

timgeb · Accepted Answer · 2020-07-02 08:27:20Z

13

Surprisingly, passing a dict with empty lists as values seems to work for Series.fillna, but not DataFrame.fillna - so if you want to work on a single column you can use this:

>>> df
     A    B    C
0  0.0  2.0  NaN
1  NaN  NaN  5.0
2  NaN  7.0  NaN
>>> df['C'].fillna({i: [] for i in df.index})
0    []
1     5
2    []
Name: C, dtype: object

The solution can be extended to DataFrames by applying it to every column.

>>> df.apply(lambda s: s.fillna({i: [] for i in df.index}))
    A   B   C
0   0   2  []
1  []  []   5
2  []   7  []

Note: for large Series/DataFrames with few missing values, this might create an unreasonable amount of throwaway empty lists.

Tested with pandas 1.0.5.

edited Jul 2, 2020 at 8:27

answered Jul 2, 2020 at 5:35

timgeb

79.2k20 gold badges129 silver badges150 bronze badges

1 Comment

DannyDannyDanny Over a year ago

Someone's going to kill me for using this :) Nice find!

Allen Qin · Accepted Answer · 2019-11-15 02:25:50Z

4

Another solution using numpy:

df.ids = np.where(df.ids.isnull(), pd.Series([[]]*len(df)), df.ids)

Or using combine_first:

df.ids = df.ids.combine_first(pd.Series([[]]*len(df)))

answered Nov 15, 2019 at 2:25

Allen Qin

20k9 gold badges55 silver badges68 bronze badges

Comments

Gian Arauz · Accepted Answer · 2023-08-25 08:14:42Z

4

Maybe not the most short/optimized solution, but I think is pretty readable:

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(eval)

EDIT

Following the advice from Swier's comment:

# Packages
import ast

# Masking-in nans
mask = df['ids'].isna()

# Filling nans with a list-like string and literally-evaluating such string
df.loc[mask, 'ids'] = df.loc[mask, 'ids'].fillna('[]').apply(ast.literal_eval)

edited Aug 25, 2023 at 8:14

answered Jun 25, 2020 at 10:47

Gian Arauz

4562 gold badges8 silver badges16 bronze badges

1 Comment

Swier Over a year ago

Instead of eval, please use ast.literal_eval, which has far fewer security ramifications. Though this specific code is secure (I think), it's only a small mismatch in the mask away from arbitrary code execution.

hellpanderr · Accepted Answer · 2015-10-18 19:28:45Z

3

Without assignments:

1) Assuming we have only floats and integers in our dataframe

import math
df.apply(lambda x:x.apply(lambda x:[] if math.isnan(x) else x))

2) For any dataframe

import math
def isnan(x):
    if isinstance(x, (int, long, float, complex)) and math.isnan(x):
        return True

df.apply(lambda x:x.apply(lambda x:[] if isnan(x) else x))

answered Oct 18, 2015 at 19:28

hellpanderr

5,9563 gold badges42 silver badges50 bronze badges

1 Comment

Ravaging Care Over a year ago

considering that numpy is already imported as np, the following line would be adequate ... df.apply(lambda x: x.apply(lambda x: [] if x is np.nan else x))

Иван Рычков · Accepted Answer · 2022-05-04 12:42:48Z

2

You can try this:

df.fillna(df.notna().applymap(lambda x: x or []))

answered May 4, 2022 at 12:42

Иван Рычков

213 bronze badges

Comments

toto_tico · Accepted Answer · 2022-07-03 09:01:18Z

2

Another solution that is explicit:

# use apply to only replace the nulls with the list  
df.loc[df.ids.isnull(), 'ids'] = df.loc[df.ids.isnull(), 'ids'].apply(lambda x: [])

edited Jul 3, 2022 at 9:01

answered Aug 17, 2021 at 22:02

toto_tico

19.2k10 gold badges102 silver badges121 bronze badges

Comments

keramat · Accepted Answer · 2019-06-03 08:34:47Z

1

Maybe more dense:

df['ids'] = [[] if type(x) != list else x for x in df['ids']]

answered Jun 3, 2019 at 8:34

keramat

4,6138 gold badges29 silver badges42 bronze badges

Comments

botivegh · Accepted Answer · 2020-04-01 15:56:21Z

1

This is probably faster, one liner solution:

df['ids'].fillna('DELETE').apply(lambda x : [] if x=='DELETE' else x)

answered Apr 1, 2020 at 15:56

botivegh

5007 silver badges17 bronze badges

Comments

mx0 · Accepted Answer · 2017-12-04 19:48:29Z

0

Create a function that checks your condition, if not, it returns an empty list/empty set etc.

Then apply that function to the variable, but also assigning the new calculated variable to the old one or to a new variable if you wish.

aa=pd.DataFrame({'d':[1,1,2,3,3,np.NaN],'r':[3,5,5,5,5,'e']})


def check_condition(x):
    if x>0:
        return x
    else:
        return list()

aa['d]=aa.d.apply(lambda x:check_condition(x))

edited Dec 4, 2017 at 19:48

mx0

7,29812 gold badges57 silver badges57 bronze badges

answered Dec 4, 2017 at 17:55

TICH

171 bronze badge

Comments

Yi Tang · Accepted Answer · 2023-04-18 15:21:22Z

I have solved a more complex case and want to share the solution here.

In each cell of the DataFrame there is a nested List with 100x sublists [a, b] inside. Some values for the columns bids_aggr3 and asks_aggr3 are np.nan. The number of NA columns is unknown. The example above shows only 2.

The goal is to use the pd.explode() to expand each row to 100x rows that containing one sublist of the original cell for each column respectively. This requires however that the values in each cell (nested lists) have same length. So i need to fill the NA's with a nested List like [[na, na], [na, na], ...[na, na]] with the length 100.

After some research i came to a generic solution which can replace NA's without specifying the columns.

nan_cell = [[np.nan, np.nan]]*100
rows = df.loc[df.isna().any(axis=1)].index
columns = df.columns[df.isna().any(axis=0)]
df.loc[rows, columns] = pd.Series([nan_cell]*len(rows))

In the line 1 the new nested List is generated. In the line 2 and 3 the rows and columns with NA's are located. In the line 4 use loc() to set the nested List as value for each NA cell. The loc() requires a Scala or Serie with the same length as input.

The result looks like this:

A quick check of the value in the first row:

HOWEVER! If the NA's are not occurring in sequence or not from the the top or the bottom of the column, rather somewhere mid in the dataframe and in different rows for each column, the solution above will not work.

In this case you can use the [index, column] pairs to directly access each cell to modify the values.

Example:

df = pd.DataFrame({'a': [0, 1, 2, np.nan, np.nan, 5, 6], 'b': [22, 23, 2, 1, 0, np.nan, 99]})

mask=df.isna().stack()
cells = mask.loc[mask].index.tolist()
cells
# idx, idy = np.where(pd.isnull(df))
# cells = np.column_stack([df.index[idx], df.columns[idy]])
# cells
for i in result:
    df.at[i[0], i[1]] = pd.Series([[np.nan, np,nan]])

The For loop could be performance bottleneck if the DataFrame is large. If someone knows a more pythonic/vectorized way, please share the solution.

Hopefully someone will find this helpful. Cheers!

Collectives™ on Stack Overflow

How to fill dataframe Nan values with empty list [] in pandas?

14 Answers 14

2 Comments

5 Comments

4 Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

14 Answers 14

2 Comments

5 Comments

4 Comments

Comments

1 Comment

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related