Here is my test_data.csv:
A,1,2,3,4,5
B,6,7,8,9,10
C,11,12,13,14,15
A,16,17,18,19,20
And I am reading it to a numpy array using the code below:
def readCSVToNumpyArray(dataset):
with open(dataset) as f:
values = [i for i in csv.reader(f)]
data = numpy.array(values)
return data
In the main code, I have:
numpyArray = readCSVToNumpyArray('test_data.csv')
print(numpyArray)
which gives me the output:
(array([['A', '1', '2', '3', '4', '5'],
['B', '6', '7', '8', '9', '10'],
['C', '11', '12', '13', '14', '15'],
['A', '16', '17', '18', '19', '20']],
dtype='|S2'))
But all the numbers in the array is treated as string, is there a good way to make them stored as float without going through each element and assign the type?
Thanks!
numpy.ndarraysare homogeneous. That's part of why they have improved performance. Maybe you could have two separate arrays, one for numbers and one for strings? Or a list of strings and array of numbers? Otherwise, you need to look into numpy records or some other datastructure. Have you considered pandas dataframes?DataFrameactually) to numpy array easily just byasarray(table).np.fromfileornp.genfromtxtare also good utils for reading text file, in your case you have to define a data type and pass it to these functions. Go and see their docstring and also take a look atnp.dtype.structured arrays).