Error reading arrays from csv files in Numpy Python

Question

I have a problem reading the first column of the csv file with numpy. All the values of the first column gets returned as nan instead of [ 2. 4. 1120.] and such.

import genfromtxt from numpy 
my_data = genfromtxt('input.csv', delimiter=',')
first_column = len(my_data[:,0]) - 1

Inside the csv file:

[   2.    4. 1120.],67.8,63.7,-676.1,-365.2,0.0,0.0,0.0,0.0,0.0,-608.3000000000001,-301.5
[  2.    4.5 100. ],0.0,0.0,-0.30000000000000004,-0.7,0.0,0.0,99.7002,0.0,0.0,-0.30000000000000004,-0.7
[   2.    4. 1130.],70.8,52.2,-672.7,-346.5,0.0,0.0,0.0,0.0,0.0,-601.9000000000001,-294.3
[  2.    4.5 110. ],0.0,0.2,-0.7,-0.1,0.0,0.0,99.3010995,0.0,0.0,-0.7,0.1

pazitos10 · Accepted Answer · 2021-04-10 02:10:39Z

First, your import sentence is inverted. It should be:

from numpy import genfromtxt

Second, apparently genfromtxt() cannot convert the string '[ 2. 4. 1120.]' to float as it does with all the other values in the array so that's why it returns nan. The same occurs with numpy.loadtxt().

An option to not "lose" those values can be reading the csv file with pandas:

import numpy as np
import pandas as pd

my_data = pd.read_csv('data.csv').to_numpy()

Where my_data contains:

array([['[  2.    4.5 100. ]', 0.0, 0.0, -0.30000000000000004, -0.7, 0.0,
        0.0, 99.7002, 0.0, 0.0, -0.30000000000000004, -0.7],
       ['[   2.    4. 1130.]', 70.8, 52.2, -672.7, -346.5, 0.0, 0.0, 0.0,
        0.0, 0.0, -601.9000000000002, -294.3],
       ['[  2.    4.5 110. ]', 0.0, 0.2, -0.7, -0.1, 0.0, 0.0,
        99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)

Although you will still need to parse every value on the first column to convert them to numpy arrays. For that, you can use np.fromstring but you will need to avoid the brackets characters in order for it to work as expected.

Without avoiding brackets you will see an error message:

np.fromstring(my_data[:, 0], sep=' ')

<ipython-input-65-7d75c8d121f5>:1: DeprecationWarning: string or file could not be read to its end due to unmatched data; this will raise a ValueError in the future.
  np.fromstring(my_data[:, 0], sep=' ')

Unfortunately, to avoid brackets you will need to loop the array:

for i, row in enumerate(my_data[:, 0]):
    my_data[i, 0] = np.fromstring(data[i, 0][1:-1], sep=' ').astype(np.float32)

By indexing with [1:-1], is "removing" the bracket characters before passing the values to np.fromstring.

After that, my_data will contain numpy arrays in the first column:

array([[array([  2. ,   4.5, 100. ], dtype=float32), 0.0, 0.0,
        -0.30000000000000004, -0.7, 0.0, 0.0, 99.7002, 0.0, 0.0,
        -0.30000000000000004, -0.7],
       [array([   2.,    4., 1130.], dtype=float32), 70.8, 52.2, -672.7,
        -346.5, 0.0, 0.0, 0.0, 0.0, 0.0, -601.9000000000002, -294.3],
       [array([  2. ,   4.5, 110. ], dtype=float32), 0.0, 0.2, -0.7,
        -0.1, 0.0, 0.0, 99.3010995, 0.0, 0.0, -0.7, 0.1]], dtype=object)

So the first column would have:

print(my_data[:, 0])

array([array([  2. ,   4.5, 100. ], dtype=float32),
       array([   2.,    4., 1130.], dtype=float32),
       array([  2. ,   4.5, 110. ], dtype=float32)], dtype=object)

Although is an elaborated solution, it works. Maybe there is a better or simpler way without the need to loop the array in order to make the conversion.

@tonyselcuk, Please update your question including the code you tried to implement and the error message. Otherwise no one can know what happened except you.

Collectives™ on Stack Overflow

Error reading arrays from csv files in Numpy Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related