I'm using numpy and Python 3.4 to read data from a .csv file.
Here is a sample of the CSV file:
"05/27/2016 09:45:37.816","187666432","7921470.8554087048","0","95.202655176457412","82.717061054954783","1.4626657999999999","158","5"
"05/27/2016 09:45:38.819","206884864","10692185.668858336","0","101.33018029563618","93.535551042125718","2.4649584999999998","158","5"
And here is my code sample used to extract data from the CSV above:
import os
import numpy as np
path = os.path.abspath('sample.csv')
csv_contents = np.genfromtxt(path, dtype=None, delimiter=',', autostrip=True, skip_header=0,
usecols=(1, 2, 3, 4, 5, 6, 7, 8))
num_cols = csv_contents.shape[1]
for x in np.nditer(csv_contents):
print('Original value: {0}'.format(x))
print('Decoded value: {0}'.format(x.tostring().decode('utf-8')))
val = x.tostring().decode('utf-8').replace('\x00', '').replace('"', '')
print('Without hex and ": {0}'.format(val))
try:
print('Float value:\t{0}\n'.format(float(val)))
except ValueError as e:
raise e
Sample output:
Original value: b'"187666432"'
Decoded value: "187666432"���������
Without hex and ": 187666432
Float value: 187666432.0
Original value: b'"7921470.8554087048"'
Decoded value: "7921470.8554087048"
Without hex and ": 7921470.8554087048
Float value: 7921470.855408705
Original value: b'"0"'
Decoded value: "0"�����������������
Without hex and ": 0
Float value: 0.0
In my for loop, to convert the x value to a float, I've had to do this:
val = x.tostring().decode('utf-8').replace('\x00', '').replace('"', '')
Which is not particularly elegant and prone to be faulty.
Question 1: Is there a better way to do this?
Question 2:
Why does x.tostring().decode('utf-8') evaluate to something like "158"��������������� when dealing with integers? Where are the hexadecimal coming from in x.tostring()?
list(b'"187666432"')etc. for these values (perhaps that will explain the �s).\0or something like that? All three decoded values have the same length:"187666432"���������"0"�����������������"7921470.8554087048"