Python: ValueError: could not convert string to float: 'D'

Question

I am loading a train.csv file to fit it with a RandomForestClassifier. The load and processing of the .csv file happens fine.I am able to play around with my dataframe.

When I try:

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=150, min_samples_split=2, n_jobs=-1)
rf.fit(train, target)

I get this:

ValueError: could not convert string to float: 'D'

I have tried:

train=train.astype(float)

Replacing all 'D' with another value.

train.convert_objects(convert_numeric=True)

But the issue still persists.

I also tried printing all the valueErrors in my csv file, but cannot find a reference to 'D'.

This is my trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-9d8e309c06b6> in <module>()
----> 1 rf.fit(train, target)

\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    222 
    223         # Convert data
--> 224         X, = check_arrays(X, dtype=DTYPE, sparse_format="dense")
    225 
    226         # Remap output

\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_arrays(*arrays, **options)
    279                     array = np.ascontiguousarray(array, dtype=dtype)
    280                 else:
--> 281                     array = np.asarray(array, dtype=dtype)
    282                 if not allow_nans:
    283                     _assert_all_finite(array)

\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    460 
    461     """
--> 462     return array(a, dtype, copy=False, order=order)
    463 
    464 def asanyarray(a, dtype=None, order=None):

ValueError: could not convert string to float: 'D'

How should I approach this problem?

You need to show us the file you're ready in. That's where the D comes from. Just a line or two should be fine, and the code where you load in your dataframe. The dataframe is not what you think it is. — Slater Victoroff
– Slater Victoroff, Commented Aug 8, 2015 at 19:54
This is what I've done: cols=['colname1','colname2'.....] train = pd.read_csv("C://Train//Train.csv", names=cols, delimiter=',') This is a single row: 5 146408P0015 34.856928 -82.439238 SA01 Greenville SC 29611 HXYF Greenville 0 0 0 0 0 HAXXF 0 0 Literacy Literacy & Language ESL Literacy & Language Books G61 AA B 266 0 — swamoch
– swamoch, Commented Aug 8, 2015 at 20:10
Why are you setting a comma as your delimiter when there are no commas? — Slater Victoroff
– Slater Victoroff, Commented Aug 9, 2015 at 23:06
Slater, I am reading out of a csv file, hence the comma. when i dont use the comma, the values are getting read with the default seperator as tab which is wrong. An interesting note here is that, when i use comma, the dtype of all the columns is Object (which i am unable to convert to float) When I don't use a comma as the seperator, the columns are float64 by default, which is what I am trying to achieve. — swamoch
– swamoch, Commented Aug 11, 2015 at 4:33

zom-pro · Accepted Answer · 2015-08-08 20:53:26Z

2

Without RandomForestClassifier is not (as far as I could find) a python library (as included in python), it's difficult to know what's going on in your case. However, what's really happening there is that at some point, you're trying to transform a string 'D' into a float. I can reproduce your error by doing:

float('D')

Now, to be able to debug this problem, I recommend you to catch the exception:

try:
  rf.fit(train, target)
except ValueError as e:
  print(e)
  #do something clever with train and target like pprint them or something.

Then you can look into what's really going on. I couldn't find much about that random forest classifier except for this that might help: https://www.npmjs.com/package/random-forest-classifier

edited Aug 8, 2015 at 20:53

answered Aug 8, 2015 at 20:04

zom-pro

1,6492 gold badges17 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Claude COULOMBE · Accepted Answer · 2015-11-13 06:36:13Z

0

You should explore and clean your data. Probably you have a 'D' somewhere in your data which your code try to convert to a float. A trace within a "try-except" block is a good idea.

answered Nov 13, 2015 at 6:36

Claude COULOMBE

3,7783 gold badges39 silver badges42 bronze badges

Collectives™ on Stack Overflow

Python: ValueError: could not convert string to float: 'D'

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related