2

Lets say I have three vectors a, b, and c:

a = np.array([1,2,3])
b = np.array([1.2, 3.2, 4.5])
c = np.array([True, True, False])

What is the simplest way to turn this into a matrix d of differing data types and column labels, as such:

d = ([[1, 1.2, True],
     [2, 3.2, True], 
     [3, 4.5, False]], 
     dtype=[('aVals','i8'), ('bVals','f4'), ('cVals','bool')])

So that I can then save this matrix to a .npy file and access the data as such after opening it;

>>> d = np.load('dFile')
>>> d['aVals']
np.array([1,2,3], dtype = [('aVals', '<i8)])

I have used a cimple column_stack to create the matrix, but I am getting a headache trying to figure out how to include the datatypes and column names, since column_stack does not accept a dtype argument, and I can't see a way to add field names and data types after the column_stack is preformed. It is worth mentioning that the vectors a, b, and c have no explicit datatypes declared upon their creation, they are as shown above.

1
  • By the way, if you are doing this simply to save the arrays, you could use np.savez(outfile, aVals=a, bVals=b, cVals=c) to save all three arrays to a compressed npz file. Commented Aug 16, 2016 at 0:14

2 Answers 2

3

There's a little known recarray function that constructs arrays like this. It was cited in a recent SO question:

Assigning field names to numpy array in Python 2.7.3

Allowing it to deduce everything from the input arrays:

In [19]: np.rec.fromarrays([a,b,c])
Out[19]: 
rec.array([(1, 1.2, True), (2, 3.2, True), (3, 4.5, False)], 
          dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '?')])

Specifying names

In [26]: d=np.rec.fromarrays([a,b,c],names=['avals','bvals','cVals'])
In [27]: d
Out[27]: 
rec.array([(1, 1.2, True), 
           (2, 3.2, True), 
           (3, 4.5, False)], 
          dtype=[('avals', '<i4'), ('bvals', '<f8'), ('cVals', '?')])
In [28]: d['cVals']
Out[28]: array([ True,  True, False], dtype=bool)

After creating the target array of right size and dtype it does a field by field copy. This is typical of the rec.recfunctions (even astype does this).

# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

A 2011 reference: How to make a Structured Array from multiple simple array

Sign up to request clarification or add additional context in comments.

3 Comments

to add to hpaulj comment there is another simple way.... from numpy.lib._iotools import easy_dtype as easy ... easy((int, float, float), names="a,b,c") ... yields a dtype of .... dtype([('a', '<i8'), ('b', '<f8'), ('c', '<f8')])
easy_dtype is used by genfromtxt to translate the dtype parameter into the more formal dtype. fromarrays uses np.rec.format_parser to do this translation.
Thank you this is perfect, and very simple
3
d = np.empty(len(a), dtype=[('aVals',a.dtype), ('bVals',b.dtype), ('cVals',c.dtype)])
d['aVals'] = a
d['bVals'] = b
d['cVals'] = c

As a reusable function:

def column_stack_overflow(**kwargs):
    dtype = [(name, val.dtype) for name, val in kwargs.items()]
    arr = np.empty(len(kwargs.values()[0]), dtype=dtype)
    for name, val in kwargs.items():
        arr[name] = val
    return arr

Then:

column_stack_overflow(aVals=a, bVals=b, cVals=c)

But note kwargs is a dict so unordered, so you might not get the columns in the order you pass them.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.