pandas.DataFrame.replace change dtype of columns

Question

So I was trying to replace np.nan values in my dataframe with None and noticed that in the process the datatype of the float columns in the dataframe changed to object even when they don't contain any missing data.

As an example:

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
data.replace(to_replace={np.nan:None}, inplace=True)

Call to data.dtypes before and after the call to replace shows that the datatype of column B changed from float to object whereas that of C stayed at int. If I remove column A from the original data that does not happen. I was wondering why that changes and how I can avoid this effect.

@yatu: Why the instant close? The linked answer says nothing about why otherwise unrelated columns see change in their dtype; the behavior in OP does not appear if A is dropped prior to the replacement. — fuglede
– fuglede, Commented Dec 27, 2019 at 12:30
Looks like a bug - could you report it here? github.com/pandas-dev/pandas/issues — ignoring_gravity
– ignoring_gravity, Commented Dec 27, 2019 at 12:32
Looks buggy to me. Can't see why replacing NaNs should also affect float columns with no missing values. I'd suggest reporting it as @ignoring_gravity suggests if you cannot find related issues — yatu
– yatu, Commented Dec 27, 2019 at 12:35
pure speculation, but I assume that None is treated as a string purely because the np.nan value exists, as in there is no clear definition of None in a string column or a numeric column, thus its treated as an object by default. — Umar.H
– Umar.H, Commented Dec 27, 2019 at 12:37

oppressionslayer · Accepted Answer · 2019-12-29 05:29:39Z

I've come across this many times, and there is a fix. precede your usage of your replace with astype(object) and it will preserve the dtypes. I've had to use this for merge issues, combine issues, etc. I'm not sure why it preserves the types when used this way, but it does and it's useful once you find out about it.

data.info()    

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.replace(to_replace={np.nan:None}, inplace=True)                                                                                                                                 

data.info()   

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null object
#B    1 non-null object
#C    1 non-null int64
#dtypes: int64(1), object(2)
#memory usage: 32.0+ bytes

import pandas as pd 
import numpy as np 
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0]) 
data.astype(object).replace(to_replace={np.nan:None}, inplace=True)                                                                                                                  

data.info()                                                                                                                                                                          

#<class 'pandas.core.frame.DataFrame'>
#Int64Index: 1 entries, 0 to 0
#Data columns (total 3 columns):
#A    0 non-null float64
#B    1 non-null float64
#C    1 non-null int64
#dtypes: float64(2), int64(1)
#memory usage: 32.0 bytes

you're not actually setting data in the second example. You simply called .info on the original df

Georgina Skibinski · Accepted Answer · 2019-12-27 13:46:16Z

It works fine, when you replace per column, and call replace from pd.Series(...) rather than from the pd.DataFrame(...).

Except, as mentioned in the comment NoneType() cannot be casted to float (or int, or any numeric - you would rather use NaN instead), hence it will be automatically casted to object.

import pandas as pd
import numpy as np
data = pd.DataFrame({'A':np.nan,'B':1.096, 'C':1}, index=[0])
print(data)
print(data.dtypes)
for col in data.columns:
    data[col].replace(to_replace={np.nan: None}, inplace=True)
print(data)
print(data.dtypes)

Output:

      A      B  C
0 NaN  1.096  1

A    float64
B    float64
C      int64
dtype: object
      A      B  C
0  None  1.096  1

A     object
B    float64
C      int64
dtype: object

Collectives™ on Stack Overflow

pandas.DataFrame.replace change dtype of columns

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related