0

I have the following dataframe :

     Daily_KWH_System      year     month  day  hour  minute  second
 0         4136.900384      2016      9    7     0       0       0
 1         3061.657187      2016      9    8     0       0       0
 2         4099.614033      2016      9    9     0       0       0
 3         3922.490275      2016      9   10     0       0       0
 4         3957.128982      2016      9   11     0       0       0
 5         4177.014316      2016      9   12     0       0       0
 6         3077.103445      2016      9   13     0       0       0
 7         4123.103795      2016      9   14     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
551               NaN       2016      11  23     0       0       0
552               NaN       2016      11  24     0       0       0
553               NaN       2016      11  25     0       0       0
..                ...       ...      ...  ...   ...     ...     ...
579               NaN       2016      11  27     0       0       0
580               NaN       2016      11  28     0       0       0

The variables type is as follows:

print(df.dtypes)

Daily_KWH_System    object
year                 int32
month                int32
day                  int32
hour                 int32
minute               int32
second               int32

I need to convert "Daily_KWH_System" to Float, so that I use in Linear Regression model.

I tried the below code, which worked fine.

 df['Daily_KWH_System'] = pd.to_numeric(df['Daily_KWH_System'], errors='coerce')

Then I replaced the NaN's to Blank space, to use in my model. And I used the following code

 df = df.replace(np.nan,' ', regex=True)

But, again the variable " Daily_KWH_System" is getting converted to Object as soon as i replace NaN'.

Please let me know how to go about it

9
  • I do not need NaN in my dataframe, because the model only accepts Int/Float. Hence I need those values to be blank, so I can predict those values Commented Feb 9, 2017 at 6:59
  • Hmm, if check this question you need remove NaN values by Daily_KWH_System = df.loc[df.Daily_KWH_System.notnull(), 'Daily_KWH_System']. But maybe need something else... Commented Feb 9, 2017 at 7:13
  • But if check this - NaN are not problem - theey are removed by dropna(). Commented Feb 9, 2017 at 7:15
  • So second posible solution is Daily_KWH_System = df.Daily_KWH_System.dropna(). Please check how it works. Commented Feb 9, 2017 at 7:17
  • Or if need remove all rows where is at least one NaN - df = df.dropna() Commented Feb 9, 2017 at 7:17

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.