1

I would like to create a numpy array with mixed types. The other SO questions that I found either create an object based array or an nested array.

Both I do not want.

How would the syntax look like to have a numpy array with one str and two int columns?

This is my present code:

import numpy as np

b = np.empty((0, 3), )
b = np.insert(b, b.shape[0], [[1, 2, 3]], axis=0)
b = np.insert(b, b.shape[0], [[1, 2, 3]], axis=0)

print(b)
print("---")

a = np.empty((0, 3), dtype='S4, int, int')
a = np.insert(a, a.shape[0], ("a", 2, 3), axis=0)
a = np.insert(a, a.shape[0], ("a", 2, 3), axis=0)

print(a)

The output:

[[1. 2. 3.]
 [1. 2. 3.]]
---
[[(b'a', 2, 3) (b'a', 2, 3) (b'a', 2, 3)]
 [(b'a', 2, 3) (b'a', 2, 3) (b'a', 2, 3)]]

EDIT:

And what I need for the array a is:

[["a" 2 3]
 ["a" 2 3]]
4
  • 3
    Did you try pandas Commented Oct 11, 2018 at 4:09
  • 1
    np.array([('a', 1, 2), ('b', 2, 3)], dtype=np.dtype('S4, int, int')) Commented Oct 11, 2018 at 4:16
  • Take a look at structured arrays. Commented Oct 11, 2018 at 4:20
  • Sorry, my question was completely misleading. I hope that it is clearer now. Commented Oct 11, 2018 at 7:25

1 Answer 1

2

Your second array is close, though I'd do it with indexing rather than insert (which is slower):

In [431]: a = np.zeros(3, dtype='S4, int, int')
In [432]: a[0] = ('a', 2, 3)
In [433]: a[1] = 1
In [434]: a
Out[434]: 
array([(b'a', 2, 3), (b'1', 1, 1), (b'', 0, 0)],
      dtype=[('f0', 'S4'), ('f1', '<i8'), ('f2', '<i8')])

A list of tuples is also a good way of constructing such an array:

In [436]: a = np.array([('a',2,3),('b',4,5)], dtype='S4, int, int')
In [437]: a
Out[437]: 
array([(b'a', 2, 3), (b'b', 4, 5)],
      dtype=[('f0', 'S4'), ('f1', '<i8'), ('f2', '<i8')])

Note that the shape is 1d (n,), with 3 fields. The fields don't count as a dimension.

Fields are accessed by name, not 'column' number:

In [438]: a['f1']
Out[438]: array([2, 4])

You made a (2,3) array, and filled each 'row' with the same thing. That's why you have repeats, while I don't.

With a unicode string dtype (default for Py3):

In [439]: a = np.array([('a',2,3),('b',4,5)], dtype='U4, int, int')
In [440]: a
Out[440]: 
array([('a', 2, 3), ('b', 4, 5)],
      dtype=[('f0', '<U4'), ('f1', '<i8'), ('f2', '<i8')])
In [441]: print(a)
[('a', 2, 3) ('b', 4, 5)]
Sign up to request clarification or add additional context in comments.

4 Comments

Sorry, my question was completely misleading. I hope that it is clearer now.
What's changed. My answer gives you what you want, just replacing 'columns' with 'fields'.
If you used 'U4' instead of 'S4' you wouldn't get the b'a' notation.
Thanks for your input. So my intention is that I can add entries to the array on demand. E.g. also via a loop. These parts of the array that are of type int should be able to be processed with [numba] ( numba.pydata.org ). Maybe it is better to leave this stackoverflow question as it is for the moment and create a new question where I state my requirements more precisely at the beginning?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.