Why this program could not convert string to float in Python

Question

Whats wrong with this?

from sklearn.preprocessing import Normalizer
from pandas import read_csv
from numpy import set_printoptions

namaFile = 'dataset.csv'
nama = ['rt', 'niagak', 'niagab', 'sosum', 'soskhus', 'p', 'tni', 'ik', 'ib', 'TARGET']
dataFrame = read_csv(namaFile, names=nama)
array = dataFrame.values

#membagi array
X = array[:,0:9]
Y = array[:,9]

skala = Normalizer().fit(X)
normalisasiX = skala.transform(X)

#data hasil
set_printoptions(precision = 3)
print(normalisasiX[0:10,:])

And when I run this program

File "C:\Users\Dini\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 433, in check_array

array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: 'ib'

please help me

It sounds like read_csv is attempting to read the parse the data from the first row, which is used for headings in the data you have. The documentation for this function gives details on how to specify header row(s) pandas.pydata.org/pandas-docs/stable/reference/api/… — OliverRadini
– OliverRadini, Commented Mar 6, 2019 at 15:59
I was looking at that, and that same page states that header: ... if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file... you're defining the names in code, so you shouldn't include the header in the file. Either do one (write headers in csv data ) or the other (write column names in code). Don't do both. — David Culbreth
– David Culbreth, Commented Mar 6, 2019 at 16:01
this line "nama = ['rt', 'niagak', 'niagab', 'sosum', 'soskhus', 'p', 'tni', 'ik', 'ib', 'TARGET']" is a first row in a csv file @OliverRadini — user9389057
– user9389057, Commented Mar 6, 2019 at 16:04
@DavidCulbreth is pointing you in the right direction I think; you need to either get the headers from the file or define them in the code — OliverRadini
– OliverRadini, Commented Mar 6, 2019 at 16:06

David Culbreth · Accepted Answer · 2019-03-06 16:35:30Z

I was looking at the docs ( the same one that @OliverRadini referred to ), and that same page states has the following:

header : int, list of int, default ‘infer’

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file

You're defining the names in code, so you shouldn't include the header in the file. Either do one (write headers in csv data ) or the other (write column names in code). Don't do both.

EDIT: My answer remains the same, but here's one way you could have discovered this yourself:

With the following csv data (what you showed in the picture):

BULAN,rt,nigak,niagab,sosum,soskhus,p,tni,ik,ib,TARGET
13-Jan,84876,902,1192,2098,3623,169,39,133,1063,94095
13-Feb,79194,902,1050,2109,3606,153,39,133,806,87992
13-Mar,75836,902,1060,1905,3166,161,39,133,785,83987
13-Apr,75571,902,112,1878,3190,158,39,133,635,82618
13-May,83797,1156,134,1900,3518,218,39,133,709,91604
13-Jun,91648,1291,127,2220,3596,249,39,133,659,99967
13-Jul,79063,1346,107,1844,3428,247,39,133,951,86798

Running this code...

from pandas import read_csv
from numpy import set_printoptions

namaFile = 'dataset.csv'
nama = ['rt', 'niagak', 'niagab', 'sosum', 'soskhus', 'p', 'tni', 'ik', 'ib', 'TARGET']

dataFrame = read_csv(namaFile, names=nama)
array = dataFrame.values

print("with names=nama...")
print(array)

dataFrame = read_csv(namaFile)
array = dataFrame.values

print("with no names...")
print(array)

dataFrame = read_csv(namaFile, names=nama, header=0)
array = dataFrame.values

print("with no names=nama and header=0...")
print(array)

You get this output:

with names=nama...
[['rt' 'nigak' 'niagab' 'sosum' 'soskhus' 'p' 'tni' 'ik' 'ib' 'TARGET']
 ['84876' '902' '1192' '2098' '3623' '169' '39' '133' '1063' '94095']
 ['79194' '902' '1050' '2109' '3606' '153' '39' '133' '806' '87992']
 ['75836' '902' '1060' '1905' '3166' '161' '39' '133' '785' '83987']
 ['75571' '902' '112' '1878' '3190' '158' '39' '133' '635' '82618']
 ['83797' '1156' '134' '1900' '3518' '218' '39' '133' '709' '91604']
 ['91648' '1291' '127' '2220' '3596' '249' '39' '133' '659' '99967']
 ['79063' '1346' '107' '1844' '3428' '247' '39' '133' '951' '86798']]

with no names...
[['13-Jan' 84876 902 1192 2098 3623 169 39 133 1063 94095]
 ['13-Feb' 79194 902 1050 2109 3606 153 39 133 806 87992]
 ['13-Mar' 75836 902 1060 1905 3166 161 39 133 785 83987]
 ['13-Apr' 75571 902 112 1878 3190 158 39 133 635 82618]
 ['13-May' 83797 1156 134 1900 3518 218 39 133 709 91604]
 ['13-Jun' 91648 1291 127 2220 3596 249 39 133 659 99967]
 ['13-Jul' 79063 1346 107 1844 3428 247 39 133 951 86798]]

with no names=nama and header=0...
[[84876   902  1192  2098  3623   169    39   133  1063 94095]
 [79194   902  1050  2109  3606   153    39   133   806 87992]
 [75836   902  1060  1905  3166   161    39   133   785 83987]
 [75571   902   112  1878  3190   158    39   133   635 82618]
 [83797  1156   134  1900  3518   218    39   133   709 91604]
 [91648  1291   127  2220  3596   249    39   133   659 99967]
 [79063  1346   107  1844  3428   247    39   133   951 86798]]

We can see clearly here that when you include the names on both, you get the headers listed in the first item, which is not what we want. When you remove the names=nama then you get all of the data from the file. When you explicitly over-write the names with names=nama header=0, you also can achieve this desired result. HOWEVER I would also like to note that your headers in your code are missing the BULAN column so be careful with that.

print() is your friend. Use it. It will tell you what your problems are.

I'm sorry, I don't understand what that means. If you have another question, please write it up in a new post and allow the whole community to try to answer it.
I mean, you've seen my data and in my other programs, I have successfully loaded the CSV file to the database, and the next step I want to normalize the data that I have entered in the database and insert the normalize data in another table in my database with python. Can you?
That is beyond the scope of the question you have asked here. If you have another question, please write it up in a new post and allow the whole community to try to answer it. When I see the new question, I would be happy to take a look at it, if someone else doesn't solve your problem first.

Collectives™ on Stack Overflow

Why this program could not convert string to float in Python

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related