10

So I was trying to replace np.nan values in my dataframe with None and noticed that in the process the datatype of the float columns in the dataframe changed to object even when they don't contain any missing data.

As an example:

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
data.replace(to_replace={np.nan:None}, inplace=True)

Call to data.dtypes before and after the call to replace shows that the datatype of column B changed from float to object whereas that of C stayed at int. If I remove column A from the original data that does not happen. I was wondering why that changes and how I can avoid this effect.

7
  • 1
    @yatu: Why the instant close? The linked answer says nothing about why otherwise unrelated columns see change in their dtype; the behavior in OP does not appear if A is dropped prior to the replacement. Commented Dec 27, 2019 at 12:30
  • 1
    Yes ur right, reopened @fuglede Commented Dec 27, 2019 at 12:30
  • Looks like a bug - could you report it here? github.com/pandas-dev/pandas/issues Commented Dec 27, 2019 at 12:32
  • Looks buggy to me. Can't see why replacing NaNs should also affect float columns with no missing values. I'd suggest reporting it as @ignoring_gravity suggests if you cannot find related issues Commented Dec 27, 2019 at 12:35
  • pure speculation, but I assume that None is treated as a string purely because the np.nan value exists, as in there is no clear definition of None in a string column or a numeric column, thus its treated as an object by default. Commented Dec 27, 2019 at 12:37

2 Answers 2

3

I've come across this many times, and there is a fix. precede your usage of your replace with astype(object) and it will preserve the dtypes. I've had to use this for merge issues, combine issues, etc. I'm not sure why it preserves the types when used this way, but it does and it's useful once you find out about it.

data.info()    

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.replace(to_replace={np.nan:None}, inplace=True)                                                                                                                                 

data.info()   

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null object
#B    1 non-null object
#C    1 non-null int64
#dtypes: int64(1), object(2)
#memory usage: 32.0+ bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.astype(object).replace(to_replace={np.nan:None}, inplace=True)                                                                                                                  

data.info()                                                                                                                                                                          

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes
Sign up to request clarification or add additional context in comments.

1 Comment

you're not actually setting data in the second example. You simply called .info on the original df
1

It works fine, when you replace per column, and call replace from pd.Series(...) rather than from the pd.DataFrame(...).

Except, as mentioned in the comment NoneType() cannot be casted to float (or int, or any numeric - you would rather use NaN instead), hence it will be automatically casted to object.

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
print(data)
print(data.dtypes)
for col in data.columns:
    data[col].replace(to_replace={np.nan: None}, inplace=True)
print(data)
print(data.dtypes)

Output:

      A      B  C
0 NaN  1.096  1

A    float64
B    float64
C      int64
dtype: object
      A      B  C
0  None  1.096  1

A     object
B    float64
C      int64
dtype: object

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.