replace empty list with NaN in pandas dataframe

Question

I'm trying to replace some empty list in my data with a NaN values. But how to represent an empty list in the expression?

import numpy as np
import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d

    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4



d.loc[d['x'] == [],['x']] = d.loc[d['x'] == [],'x'].apply(lambda x: np.nan)
d

ValueError: Arrays were different lengths: 4 vs 0

And, I want to select [text] by using d[d['x'] == ["text"]] with a ValueError: Arrays were different lengths: 4 vs 1 error, but select 3 by using d[d['y'] == 3] is correct. Why?

How is d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)? — Abdou
– Abdou, Commented Nov 26, 2016 at 13:59

Abdou · Accepted Answer · 2016-11-26 14:22:05Z

42

If you wish to replace empty lists in the column x with numpy nan's, you can do the following:

d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)

If you want to subset the dataframe on rows equal to ['text'], try the following:

d[[y==['text'] for y in d.x]]

I hope this helps.

answered Nov 26, 2016 at 14:22

Abdou

13.3k4 gold badges44 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Shawn Mark · Accepted Answer · 2022-07-05 16:17:29Z

You can use function "apply" to match the specified cell value no matter it is the instance of string, list and so on.

For example, in your case:

import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d
    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4

if you use d == 3 to select the cell whose value is 3, it's totally ok:

      x       y
0   False   False
1   False   False
2   False   True
3   False   False

However, if you use the equal sign to match a list, there may be out of your exception, like d == [text] or d == ['text'] or d == '[text]', such as the following:

There's some solutions:

Use function apply() on the specified Series in your Dataframe just like the answer on the top:

A more general method with the function applymap() on a Dataframe may be used for the preprocessing step:

d.applymap(lambda x: x == [])
```
  x       y
```
0 False False 1 False False 2 False False 3 True False

Wish it can help you and the following learners and it would be better if you add a type check in you applymap function which would otherwise cause some exceptions probably.

Alex · Accepted Answer · 2016-11-26 19:14:50Z

1

To answer your main question, just leave out the empty lists altogether. The NaN's will automatically get populated in if there's a value in one column and not the other if you use pandas.concat instead of building a dataframe from a dictionary.

>>> import pandas as pd
>>> ser1 = pd.Series([[1,2,3], [1,2], ["text"]], name='x')
>>> ser2 = pd.Series([1,2,3,4], name='y')
>>> result = pd.concat([ser1, ser2], axis=1)
>>> result
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2     [text]  3
3        NaN  4

About your second question, it seems that you can't search inside of an element. Perhaps you should make that a separate question since it's not really related to your main question.

answered Nov 26, 2016 at 19:14

Alex

2,5293 gold badges31 silver badges57 bronze badges

Comments

Jean-Francois T. · Accepted Answer · 2023-05-29 02:23:54Z

0

There is a way to do it without using apply (which might be slow on big DataFrames).

You can use the little trick of .str.len() on lists: it is initially designed to compute length of strings but also works on lists.

Combined with the .loc[<condition>, <column>] = np.nan, that will do the trick: df.loc[df.x.str.len() == 0, "x"] = np.nan

With your example, that would give:

>>> df = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
>>> df
    x   y
0   [1, 2, 3]   1
1   [1, 2]  2
2   [text]  3
3   []  4

>>> df.loc[df.x.str.len() == 0, "x"] = np.nan
>>> df
x   y
0   [1, 2, 3]   1
1   [1, 2]  2
2   [text]  3
3   NaN 4

answered May 29, 2023 at 2:23

Jean-Francois T.

13.3k7 gold badges82 silver badges118 bronze badges

Collectives™ on Stack Overflow

replace empty list with NaN in pandas dataframe

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related