IndexError on slicing array

Question

I assume I'm asking a newbie question but have spent too much time today searching for an answer. I get an IndexError: too many indices for array error when naively attempting to perform the same slice operation on a numpy array after saving and reloading with np.genfromtxt.

Note: I see that the dimension has changed from (3,6) to (3,) upon reloading but was unable to convert the result back to dimensions (3,6)- this is the part I assume must be obvious (or maybe I need to specify type differently)

yo = np.arange(18)
yo = yo.reshape(3,6)

print(yo)
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]]

print(yo[:,:2])
[[ 0  1]
 [ 6  7]
 [12 13]]

np.savetxt("test_data.csv", yo, delimiter=",",  fmt='%1.4e')
yo_reloaded = np.genfromtxt("test_data.csv", dtype=(float, float, float, float, float, float), delimiter = ",")

#same as above but doesn't work
print(yo_reloaded[:,:2])
IndexError: too many indices for array

print(yo_reloaded)
[(  0.,   1.,   2.,   3.,   4.,   5.) (  6.,   7.,   8.,   9.,  10.,  11.)
 ( 12.,  13.,  14.,  15.,  16.,  17.)]

# shape changed
print(yo_reloaded.shape)
(3,)

Omit the dtype for genfromtxt. float is the default. Giving multiple dtypes tells it to load it as a structured array. Look at the dtype of the reload. — hpaulj
– hpaulj, Commented Apr 1, 2018 at 20:20

unutbu · Accepted Answer · 2018-04-01 20:23:58Z

1

Use dtype=None to tell genfromtxt to attempt to intelligently guess the dtype. In this case, since all values are floats, genfromtxt will assign a floating-point dtype to the array:

In [19]: yo_reloaded = np.genfromtxt("test_data.csv", dtype=None, delimiter = ",")
In [21]: yo_reloaded.dtype
Out[21]: dtype('float64')

and yo_reload will have shape (3,6).

In contrast, if you set dtype=(float, float, float, float, float, float):

yo_reloaded = np.genfromtxt("test_data.csv", dtype=(float, float, float, float, float, float), delimiter = ",")

then yo_reloaded.dtype becomes:

In [18]: yo_reloaded.dtype
Out[18]: dtype([('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

which is the dtype of a structured array. The shape of the structured array is (3,) become NumPy views this array as consisting of 3 rows with each row having a single value consisting of 6 fields of floating-point dtype. That's simply not what you want, but what you get when you set dtype to equal a tuple of types.

Note you could also obtain the desired array using dtype=float:

In [24]: yo_reloaded = np.genfromtxt("test_data.csv", dtype=float, delimiter = ",")
In [25]: yo_reloaded.shape
Out[25]: (3, 6)
In [26]: yo_reloaded.dtype
Out[26]: dtype('float64')

Or, as hpaulj points out, you could simply omit the dtype parameter altogether, in which case it defaults to dtype=float.

edited Apr 1, 2018 at 20:23

answered Apr 1, 2018 at 20:20

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mateen Ulhaq Over a year ago

Your name spelled backwards is ubtunu... o_o

galmeriol · Accepted Answer · 2018-04-01 20:25:36Z

1

if you run print(yo_reloaded.shape) before print(yo_reloaded[:,:2]) you can see that your np.genfromtxt() call returns (3,) which means 3 rows with one column data.

When you use dtype=(float, float, float, float, float, float) you are mapping every row in "test_data.csv" 5-tuple. So np.genfromtxt() returns every row as a 5-tuple element.

In order to get the same results you have to change dtype=dtype=(float, float, float, float, float, float) to dtype=float.

answered Apr 1, 2018 at 20:25

galmeriol

4614 silver badges15 bronze badges

Collectives™ on Stack Overflow

IndexError on slicing array

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related