20

I'm trying to replace some empty list in my data with a NaN values. But how to represent an empty list in the expression?

import numpy as np
import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d

    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4



d.loc[d['x'] == [],['x']] = d.loc[d['x'] == [],'x'].apply(lambda x: np.nan)
d

ValueError: Arrays were different lengths: 4 vs 0

And, I want to select [text] by using d[d['x'] == ["text"]] with a ValueError: Arrays were different lengths: 4 vs 1 error, but select 3 by using d[d['y'] == 3] is correct. Why?

1
  • 1
    How is d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)? Commented Nov 26, 2016 at 13:59

4 Answers 4

42

If you wish to replace empty lists in the column x with numpy nan's, you can do the following:

d.x = d.x.apply(lambda y: np.nan if len(y)==0 else y)

If you want to subset the dataframe on rows equal to ['text'], try the following:

d[[y==['text'] for y in d.x]]

I hope this helps.

Sign up to request clarification or add additional context in comments.

Comments

4

You can use function "apply" to match the specified cell value no matter it is the instance of string, list and so on.

For example, in your case:

import pandas as pd
d = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
d
    x           y
0   [1, 2, 3]   1
1   [1, 2]      2
2   [text]      3
3   []          4

if you use d == 3 to select the cell whose value is 3, it's totally ok:

      x       y
0   False   False
1   False   False
2   False   True
3   False   False

However, if you use the equal sign to match a list, there may be out of your exception, like d == [text] or d == ['text'] or d == '[text]', such as the following:

There's some solutions:

  1. Use function apply() on the specified Series in your Dataframe just like the answer on the top:

  1. A more general method with the function applymap() on a Dataframe may be used for the preprocessing step:

    d.applymap(lambda x: x == [])

      x       y
    

    0 False False 1 False False 2 False False 3 True False

Wish it can help you and the following learners and it would be better if you add a type check in you applymap function which would otherwise cause some exceptions probably.

Comments

1

To answer your main question, just leave out the empty lists altogether. The NaN's will automatically get populated in if there's a value in one column and not the other if you use pandas.concat instead of building a dataframe from a dictionary.

>>> import pandas as pd
>>> ser1 = pd.Series([[1,2,3], [1,2], ["text"]], name='x')
>>> ser2 = pd.Series([1,2,3,4], name='y')
>>> result = pd.concat([ser1, ser2], axis=1)
>>> result
           x  y
0  [1, 2, 3]  1
1     [1, 2]  2
2     [text]  3
3        NaN  4

About your second question, it seems that you can't search inside of an element. Perhaps you should make that a separate question since it's not really related to your main question.

Comments

0

There is a way to do it without using apply (which might be slow on big DataFrames).

You can use the little trick of .str.len() on lists: it is initially designed to compute length of strings but also works on lists.

Combined with the .loc[<condition>, <column>] = np.nan, that will do the trick: df.loc[df.x.str.len() == 0, "x"] = np.nan

With your example, that would give:

>>> df = pd.DataFrame({'x' : [[1,2,3], [1,2], ["text"], []], 'y' : [1,2,3,4]})
>>> df
    x   y
0   [1, 2, 3]   1
1   [1, 2]  2
2   [text]  3
3   []  4

>>> df.loc[df.x.str.len() == 0, "x"] = np.nan
>>> df
x   y
0   [1, 2, 3]   1
1   [1, 2]  2
2   [text]  3
3   NaN 4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.