Converting a numpy.ndarray value from bytes to float

Question

I'm using numpy and Python 3.4 to read data from a .csv file.

Here is a sample of the CSV file:

"05/27/2016 09:45:37.816","187666432","7921470.8554087048","0","95.202655176457412","82.717061054954783","1.4626657999999999","158","5"
"05/27/2016 09:45:38.819","206884864","10692185.668858336","0","101.33018029563618","93.535551042125718","2.4649584999999998","158","5"

And here is my code sample used to extract data from the CSV above:

import os
import numpy as np

path = os.path.abspath('sample.csv')
csv_contents = np.genfromtxt(path, dtype=None, delimiter=',', autostrip=True, skip_header=0,
                             usecols=(1, 2, 3, 4, 5, 6, 7, 8))

num_cols = csv_contents.shape[1]

for x in np.nditer(csv_contents):
    print('Original value: {0}'.format(x))
    print('Decoded value: {0}'.format(x.tostring().decode('utf-8')))
    val = x.tostring().decode('utf-8').replace('\x00', '').replace('"', '')
    print('Without hex and ": {0}'.format(val))

    try:
        print('Float value:\t{0}\n'.format(float(val)))
    except ValueError as e:
        raise e

Sample output:

Original value: b'"187666432"'
Decoded value: "187666432"���������
Without hex and ": 187666432
Float value:    187666432.0

Original value: b'"7921470.8554087048"'
Decoded value: "7921470.8554087048"
Without hex and ": 7921470.8554087048
Float value:    7921470.855408705

Original value: b'"0"'
Decoded value: "0"�����������������
Without hex and ": 0
Float value:    0.0

In my for loop, to convert the x value to a float, I've had to do this:

val = x.tostring().decode('utf-8').replace('\x00', '').replace('"', '')

Which is not particularly elegant and prone to be faulty.

Question 1: Is there a better way to do this?

Question 2: Why does x.tostring().decode('utf-8') evaluate to something like "158"�� when dealing with integers? Where are the hexadecimal coming from in x.tostring()?

which version of numpy are you using? Can you print the output of list(b'"187666432"') etc. for these values (perhaps that will explain the �s). — Andy Hayden
– Andy Hayden, Commented May 27, 2016 at 20:42
I'm on numpy 1.11.0. For your other request, I'll check once I'm back on my laptop! :) — HEADLESS_0NE
– HEADLESS_0NE, Commented May 27, 2016 at 20:47
Perhaps it's a fixed length value, filled with some \0 or something like that? All three decoded values have the same length: "187666432"�� "0"�� "7921470.8554087048" — Luis
– Luis, Commented May 27, 2016 at 20:47
@HEADLESS_0NE Running on Python 3.4.3, Ubuntu, numpy 1.11.0. I ran it on IPython, but just checked that it runs in python directly as well. Maybe some OSX-EndOfLine-Stuff? (ain't got no idea about OSX :P) — Luis
– Luis, Commented May 28, 2016 at 12:42

Andy Hayden · Accepted Answer · 2016-05-27 20:54:20Z

To answer the first question:

I strongly recommend using pandas to read in csv files:

In [11]: pd.read_csv(path, header=None)
Out[11]:
                         0          1             2  3           4          5         6    7  8
0  05/27/2016 09:45:37.816  187666432  7.921471e+06  0   95.202655  82.717061  1.462666  158  5
1  05/27/2016 09:45:38.819  206884864  1.069219e+07  0  101.330180  93.535551  2.464958  158  5

It "sniffs out" whether you have quoted strings, an unquoted, though this can be made explicit.

To answer the second question:

If you use flatten rather than nditer it doesn't add the \x00s (which make the length of each string to length 20; the s20 dtype):

In [21]: a
Out[21]:
array([[b'"187666432"', b'"7921470.8554087048"', b'"0"',
        b'"95.202655176457412"', b'"82.717061054954783"',
        b'"1.4626657999999999"', b'"158"', b'"5"'],
       [b'"206884864"', b'"10692185.668858336"', b'"0"',
        b'"101.33018029563618"', b'"93.535551042125718"',
        b'"2.4649584999999998"', b'"158"', b'"5"']],
      dtype='|S20')

In [22]: [i.tostring() for i in np.nditer(a)]
Out[22]:
[b'"187666432"\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"7921470.8554087048"',
 b'"0"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"95.202655176457412"',
 b'"82.717061054954783"',
 b'"1.4626657999999999"',
 b'"158"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"5"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"206884864"\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"10692185.668858336"',
 b'"0"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"101.33018029563618"',
 b'"93.535551042125718"',
 b'"2.4649584999999998"',
 b'"158"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 b'"5"\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']

In [23]: [i.tostring() for i in a.flatten()]
Out[23]:
[b'"187666432"',
 b'"7921470.8554087048"',
 b'"0"',
 b'"95.202655176457412"',
 b'"82.717061054954783"',
 b'"1.4626657999999999"',
 b'"158"',
 b'"5"',
 b'"206884864"',
 b'"10692185.668858336"',
 b'"0"',
 b'"101.33018029563618"',
 b'"93.535551042125718"',
 b'"2.4649584999999998"',
 b'"158"',
 b'"5"']

Thanks for flatten! I hadn't thought about changing the way I was iterating over my array. I'll look into pandas; seems pretty handy.

Collectives™ on Stack Overflow

Converting a numpy.ndarray value from bytes to float

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related