2

I am trying to add column names to a Numpy array, basically turning it into structured array even though the data types are all the same.

Pandas would be an easy solution, but the project I am working on will not include pandas as a dependency.

I tried:

signal = np.array([[1,2,3],[1,2,3],[1,2,3]])
col_names = ('left','right','center')
signal = np.array(signal, dtype = [(n, 'int16') for n in col_names])

but this returns:

array([[(1, 1, 1), (2, 2, 2), (3, 3, 3)],
       [(1, 1, 1), (2, 2, 2), (3, 3, 3)],
       [(1, 1, 1), (2, 2, 2), (3, 3, 3)]],
      dtype=[('left', '<i2'), ('right', '<i2'), ('center', '<i2')])

Basically, the array represents a multi-channel signal. I want to be able to subset the channels using column names:

signal['left'] == signal[:,0] # True
signal[['left','center']] == signal[:,[0,2]] # True

I also saw a post someone advised against using structured array. Is there a potential downside to it? Say it makes the array slower to access?

4 Answers 4

1

The correct data input form for a structured array is a list of tuples:

In [71]: signal = [(1,2,3),(2,3,1),(3,2,1)] 
    ...: col_names = ('left','right','center') 
    ...: signal = np.array(signal, dtype = [(n, 'int16') for n in col_names])   
In [72]:                                                                        
In [72]: signal                                                                 
Out[72]: 
array([(1, 2, 3), (2, 3, 1), (3, 2, 1)],
      dtype=[('left', '<i2'), ('right', '<i2'), ('center', '<i2')])

1.16 has added a couple of functions that make it easier to convert to and from structured arrays:

In [73]: import numpy.lib.recfunctions as rfn                                   
In [74]: signal = np.array([[1,2,3],[1,2,3],[1,2,3]])                           
In [75]: dt = np.dtype([(n, 'int16') for n in col_names])                       
In [76]: dt                                                                     
Out[76]: dtype([('left', '<i2'), ('right', '<i2'), ('center', '<i2')])
In [77]: rfn.unstructured_to_structured(signal, dt)                             
Out[77]: 
array([(1, 2, 3), (1, 2, 3), (1, 2, 3)],
      dtype=[('left', '<i2'), ('right', '<i2'), ('center', '<i2')])

Applying this dt to signal has a problem:

In [82]: signal.view(dt)                                                        
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-82-f0fa01ce8128> in <module>
----> 1 signal.view(dt)

ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype

We can get around that by first converting signal to a compatible dtype:

In [83]: signal.astype('i2').view(dt)                                           
Out[83]: 
array([[(1, 2, 3)],
       [(1, 2, 3)],
       [(1, 2, 3)]],
      dtype=[('left', '<i2'), ('right', '<i2'), ('center', '<i2')])

But note that Out[83] shape is (3,1). The other arrays were shape (3,). view has always had this shape problem when converting to/from structured arrays. That's part of why the newer functions are easier to use.

Sign up to request clarification or add additional context in comments.

1 Comment

This does create a new structure array. But now I can no longer interchange the use of signal['left'] and signal[:,0]. Is there any way to achieve both?
1
values = [(1,2,3),(1,2,3),(1,2,3)]
signal = np.array(values, [('left', '<i2'), ('center', '<i2'), ('right', '<i2')])
signal['right']
array([3, 3, 3], dtype=int16)

1 Comment

Since values may not be handy (e.g. values = values[values[:,0] != 0], one may need to use "list(map(tuple, signal))" or another way to bring it from ndarray to the list of tuples.
0

You can get the correct results by taking a view

>>> signal.view(dtype=[(n, signal.dtype) for n in col_names])

array([[(1, 2, 3)],
   [(1, 2, 3)],
   [(1, 2, 3)]],
  dtype=[('left', '<i8'), ('right', '<i8'), ('center', '<i8')])

As far as performance goes, it's not something to worry about. Structured arrays are ndarrays, you just get the added benefit of having more complex datatypes. On the other hand, record arrays are structured arrays that allow looking up field names as object attributes - that introduces some overhead to attribute lookups, but it's still generally quite minimal overhead compared to computations on the data.

2 Comments

Shape changes with view.
hi, although doing so does create a structured array, np.array_equal(a['left'], a[:,0]) is now false. because a is now only 1 column
0

Problem in Your Code

You're trying to turn a 2D NumPy array into a structured array with field names. But your line:

signal = np.array(signal, dtype = [(n, 'int16') for n in col_names])

Solution:

To convert a 2D array into a structured array row-wise with named columns, do:

import numpy as np

signal = np.array([[1,2,3],
                   [4,5,6],
                   [7,8,9]])

col_names = ('left', 'right', 'center')
dtype = [(name, 'int16') for name in col_names]

structured_signal = np.core.records.fromarrays(signal.T, dtype=dtype)

print(structured_signal['left'])
print(structured_signal[['left', 'center']])

Recommendation
If your data is homogeneous (e.g., all int16) and performance is critical, stick with 2D arrays and manage column names with a separate list or dictionary:

column_names = ['left', 'right', 'center']
col_idx = {name: i for i, name in enumerate(column_names)}

signal[:, col_idx['left']]
signal[:, [col_idx['left'], col_idx['center']]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.