I am loading a train.csv file to fit it with a RandomForestClassifier. The load and processing of the .csv file happens fine.I am able to play around with my dataframe.
When I try:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=150, min_samples_split=2, n_jobs=-1)
rf.fit(train, target)
I get this:
ValueError: could not convert string to float: 'D'
I have tried:
train=train.astype(float)
Replacing all 'D' with another value.
train.convert_objects(convert_numeric=True)
But the issue still persists.
I also tried printing all the valueErrors in my csv file, but cannot find a reference to 'D'.
This is my trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-9d8e309c06b6> in <module>()
----> 1 rf.fit(train, target)
\Anaconda3\lib\site-packages\sklearn\ensemble\forest.py in fit(self, X, y, sample_weight)
222
223 # Convert data
--> 224 X, = check_arrays(X, dtype=DTYPE, sparse_format="dense")
225
226 # Remap output
\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_arrays(*arrays, **options)
279 array = np.ascontiguousarray(array, dtype=dtype)
280 else:
--> 281 array = np.asarray(array, dtype=dtype)
282 if not allow_nans:
283 _assert_all_finite(array)
\Anaconda3\lib\site-packages\numpy\core\numeric.py in asarray(a, dtype, order)
460
461 """
--> 462 return array(a, dtype, copy=False, order=order)
463
464 def asanyarray(a, dtype=None, order=None):
ValueError: could not convert string to float: 'D'
How should I approach this problem?
cols=['colname1','colname2'.....]train = pd.read_csv("C://Train//Train.csv", names=cols, delimiter=',')This is a single row:5 146408P0015 34.856928 -82.439238 SA01 Greenville SC 29611 HXYF Greenville 0 0 0 0 0 HAXXF 0 0 Literacy Literacy & Language ESL Literacy & Language Books G61 AA B 266 0