3

I am loading a train.csv file to fit it with a RandomForestClassifier. The load and processing of the .csv file happens fine.I am able to play around with my dataframe.

When I try:

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=150, min_samples_split=2, n_jobs=-1)
rf.fit(train, target)

I get this:

ValueError: could not convert string to float: 'D'

I have tried:

train=train.astype(float)

Replacing all 'D' with another value.

train.convert_objects(convert_numeric=True)

But the issue still persists.

I also tried printing all the valueErrors in my csv file, but cannot find a reference to 'D'.

This is my trace:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-9d8e309c06b6> in <module>()
----> 1 rf.fit(train, target)

\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
    222 
    223         # Convert data
--> 224         X, = check_arrays(X, dtype=DTYPE, sparse_format="dense")
    225 
    226         # Remap output

\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_arrays(*arrays, **options)
    279                     array = np.ascontiguousarray(array, dtype=dtype)
    280                 else:
--> 281                     array = np.asarray(array, dtype=dtype)
    282                 if not allow_nans:
    283                     _assert_all_finite(array)

\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
    460 
    461     """
--> 462     return array(a, dtype, copy=False, order=order)
    463 
    464 def asanyarray(a, dtype=None, order=None):

ValueError: could not convert string to float: 'D'

How should I approach this problem?

4
  • You need to show us the file you're ready in. That's where the D comes from. Just a line or two should be fine, and the code where you load in your dataframe. The dataframe is not what you think it is. Commented Aug 8, 2015 at 19:54
  • This is what I've done: cols=['colname1','colname2'.....] train = pd.read_csv("C://Train//Train.csv", names=cols, delimiter=',') This is a single row: 5 146408P0015 34.856928 -82.439238 SA01 Greenville SC 29611 HXYF Greenville 0 0 0 0 0 HAXXF 0 0 Literacy Literacy & Language ESL Literacy & Language Books G61 AA B 266 0 Commented Aug 8, 2015 at 20:10
  • Why are you setting a comma as your delimiter when there are no commas? Commented Aug 9, 2015 at 23:06
  • Slater, I am reading out of a csv file, hence the comma. when i dont use the comma, the values are getting read with the default seperator as tab which is wrong. An interesting note here is that, when i use comma, the dtype of all the columns is Object (which i am unable to convert to float) When I don't use a comma as the seperator, the columns are float64 by default, which is what I am trying to achieve. Commented Aug 11, 2015 at 4:33

2 Answers 2

2

Without RandomForestClassifier is not (as far as I could find) a python library (as included in python), it's difficult to know what's going on in your case. However, what's really happening there is that at some point, you're trying to transform a string 'D' into a float. I can reproduce your error by doing:

float('D')

Now, to be able to debug this problem, I recommend you to catch the exception:

try:
  rf.fit(train, target)
except ValueError as e:
  print(e)
  #do something clever with train and target like pprint them or something.

Then you can look into what's really going on. I couldn't find much about that random forest classifier except for this that might help: https://www.npmjs.com/package/random-forest-classifier

Sign up to request clarification or add additional context in comments.

Comments

0

You should explore and clean your data. Probably you have a 'D' somewhere in your data which your code try to convert to a float. A trace within a "try-except" block is a good idea.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.