datatype conflicts - Strings and Floats in one NumpyArray

Question

I have the following arrays:

a = ['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'] # strings
b = [0.0, 0.1] # floats
c = [0.0, 0.2] # floats
d = [0.0, 0.3] # floats
e = [0.0, 0.4] # floats

My goal is to create a final 2d array, such that the datatypes are preserved, with numpy:

final = [a, b, c, d, e] -> [ ['(0.0 | 0.0 | 0.0)', ...] , [0.0, 0.1], ... ]

When I tried to do this with

np.array([a, b, c, d, e])

what happens is that the floats are converted to strings. Naturally, I went to look at the dtype documentation from numpy dtype doc and tried to create my own personal dtype through

dt = np.dtype([('f1', np.str), ('f2', np.float), ('f3', np.float), ('f4', np.float), ('f5', np.float)])
final = np.array([a, b, c, d, e], dtype=dt)

However it's trying to convert the string array into floats:

ValueError: could not convert string to float: '(0.0 | 0.0 | 0.0)'

Does anyone know what I'm doing wrong? This should be possible...

AFAICT, arrays in Numpy are homogeneous. You either have all items in the arrays to be strings or float or some other dtype. — Oluwafemi Sule
– Oluwafemi Sule, Commented May 14, 2018 at 16:45
Yes, but you can construct dtype objects that accept structured, mixed data types. Like a string followed by an integer can be constructed. I think my problem here is that I have subarrays of different datatypes :S — Luis Figueiredo
– Luis Figueiredo, Commented May 14, 2018 at 16:47

hpaulj · Accepted Answer · 2018-05-14 17:17:38Z

In [256]: a = ['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'] # strings
     ...: b = [0.0, 0.1] # floats
     ...: c = [0.0, 0.2] # floats
     ...: d = [0.0, 0.3] # floats
     ...: e = [0.0, 0.4] # floats
     ...: 
     ...: 

In [267]: dt = np.dtype([('f1', 'U20'), ('f2', np.float), ('f3', np.float), ('f4
     ...: ', np.float), ('f5', np.float)])

A structured array has to initialized with a list of tuples:

In [271]: [x for x in zip(a,b,c,d,e)]
Out[271]: 
[('(0.0 | 0.0 | 0.0)', 0.0, 0.0, 0.0, 0.0),
 ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)]

In [273]: np.array([x for x in zip(a,b,c,d,e)],dtype=dt)
Out[273]: 
array([('(0.0 | 0.0 | 0.0)', 0. , 0. , 0. , 0. ),
       ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)],
      dtype=[('f1', '<U20'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

Or filled field by field:

In [268]: arr = np.empty(2, dtype=dt)
In [269]: for n, x in zip(arr.dtype.names, [a,b,c,d,e]):
     ...:     arr[n] = np.array(x)
     ...:     
In [270]: arr
Out[270]: 
array([('(0.0 | 0.0 | 0.0)', 0. , 0. , 0. , 0. ),
       ('(0.0 | 0.0 | 0.1)', 0.1, 0.2, 0.3, 0.4)],
      dtype=[('f1', '<U20'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8')])

Such an array can be accessed by field name or record number:

In [274]: arr['f1']
Out[274]: array(['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'], dtype='<U20')
In [276]: arr['f3']
Out[276]: array([0. , 0.2])
In [277]: arr[0]
Out[277]: ('(0.0 | 0.0 | 0.0)', 0., 0., 0., 0.)

It is a 1d array, not 2d.

Another option is an object dtype array:

In [278]: np.array([a,b,c,d,e], dtype=object)
Out[278]: 
array([['(0.0 | 0.0 | 0.0)', '(0.0 | 0.0 | 0.1)'],
       [0.0, 0.1],
       [0.0, 0.2],
       [0.0, 0.3],
       [0.0, 0.4]], dtype=object)
In [279]: _.shape
Out[279]: (5, 2)

Wow. That's incredibly useful. Thank you so much. Both the zip method as well as dtype=object are intuitive for me.

Collectives™ on Stack Overflow

datatype conflicts - Strings and Floats in one NumpyArray

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related