0

I am trying to populate a dataframe with the following code:

df = pd.DataFrame(data=np.random.choice([1, np.nan], size=5))


0     1  
1     1  
2   NaN  
3     1  
4     1   

Then:

df[df[0].isnull()]

2   NaN

So far, so good. But if I am modifying the 1 to '1' things get strange (imo).

df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))

0    1  
1    1  
2    1  
3    1  
4  nan  

Problems come with the isnull

df[df[0].isnull()]

Empty DataFrame  
Columns: [0]  
Index: []

How can I get the nan (which is a string) to behave like a NaN? I want to be able to filter quickly on all null/non-null values within my dataframe.

Thanks.

3
  • The issue here is that the NaN is being converted to the str nan which is surprising AFAIK real NaN require float dtype, in this case you'd have to compare with the str nan which is weird IMO Commented Dec 1, 2015 at 13:55
  • Thanks. For the moment I am converting the str nan back to the "normal" NaN . Commented Dec 1, 2015 at 14:59
  • I kinda feel this is a bug though as I wouldn't expect 'nan' when I've explicitly passed NaN Commented Dec 1, 2015 at 15:01

1 Answer 1

1

NaN is a concept which makes sense while working with numbers, not strings. When you create the dataframe with '1's Pandas is inferring the type of that column: str, which IMO is correct. So it will then convert NaN values to their string representation.

Note that if, for example, you say:

df = pd.DataFrame(data=np.random.choice(['1', 2], size=5))

The 2 will be converted as well to strings. Because, again, Pandas is inferring the string type for the whole column.

However, you can still filter easily with your proposed dataframe:

df = pd.DataFrame(data=np.random.choice(['1', np.nan], size=5))
df[df[0] == 'nan']
Sign up to request clarification or add additional context in comments.

3 Comments

I spot the automatic conversion, but it's not happening when using None. pd.DataFrame(data=np.random.choice(['1', None], size=5)) will generate a dataframe with some NoneType value in it.
Actually, playing with this, None is working with the isnull method.
@Extratoro: Interesting. Note that None is a different type; a basic Python type, which Pandas may be treating differently. However, NaN is a float type; see type(numpy.nan). So I guess Pandas will simply convert any number (int or float) to string when it infers the str type for the column (and NaN is simply a float type).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.