4

I am following a this tutorial to write a Naive Bayes Classifier: http://machinelearningmastery.com/naive-bayes-classifier-scratch-python/

I keep getting this error:

dataset[i] = [float(x) for x in dataset[i]]
ValueError: could not convert string to float: 

Here is the part of my code where the error occurs:

def loadDatasetNB(filename):
    lines = csv.reader(open(filename, "rt"))
    dataset = list(lines)
    for i in range(len(dataset)):
        dataset[i] = [float(x) for x in dataset[i]]
    return dataset

And here is how the file is called:

def NB_Analysis():
    filename = 'fvectors.csv'
    splitRatio = 0.67
    dataset = loadDatasetNB(filename)
    trainingSet, testSet = splitDatasetNB(dataset, splitRatio)
    print('Split {0} rows into train={1} and test={2} rows').format(len(dataset), len(trainingSet), len(testSet))
    # prepare model
    summaries = summarizeByClassNB(trainingSet)
    # test model
    predictions = getPredictionsNB(summaries, testSet)
    accuracy = getAccuracyNB(testSet, predictionsNB)
    print('Accuracy: {0}%').format(accuracy)

NB_Analysis()

My file fvectors.csv looks like this

What is going wrong here and how do I fix it?

0

4 Answers 4

5

Try to skip a header, an empty header in the first column is causing the issue.

>>> float(' ')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float:

If you want to skip the header you can achieve it with:

def loadDatasetNB(filename):
    lines = csv.reader(open(filename, "rt"))
    next(reader, None)  # <<- skip the headers
    dataset = list(lines)
    for i in range(len(dataset)):
        dataset[i] = [float(x) for x in dataset[i]]
    return dataset

(2) Or you can just ignore the exception:

try:
    float(element)
except ValueError:
    pass

If you decide to go with option (2), make sure that you skip only first row or only rows that contain text and you know it for sure.

Sign up to request clarification or add additional context in comments.

Comments

1

There is an empty line.

>> float('')
ValueError: could not convert string to float:

You can check the value before casting it:

dataset[i] = [float(x) for x in dataset[i] if x != '']

Comments

1

Looking at the image of your data, python cannot convert the last column of your data with the values square and circle. Also, you have a header in your data that you need to skip.

Try using this code:

def loadDatasetNB(filename):
    with open(filename, 'r') as fp:
        reader= csv.reader(fp)
        # skip the header line
        header = next(reader)
        # save the features and the labels as different lists
        data_features = []
        data_labels = []
        for row in reader:
            # convert everything except the label to a float
            data_features.append([float(x) for x in row[:-1]])
            # save the labels separately
            data_labels.append(row[-1])
    return data_features, data_labels

Comments

0

You are loading strings into the float constructor here, which unless are under specific conditions, raises an error:

dataset[i] = [float(x) for x in dataset[i]]

Instead of using a list comprehension, perhaps it would be better to use a for loop so you can more easily handle this case:

data = []
for x in dataset[i]:
    try:
        value = float(x)
    except ValueError:
        value = x
    data.append(value)
dataset[i] = data

See more about catching exceptions here:

Try/Except in Python: How do you properly ignore Exceptions?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.