Convert numpy structured array to 2d array efficiently

Question

I have big structured numpy array like this:

array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
       (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
       (-0.84541476, -11.3712845, -0.8726147 , 75, 50,   0,  0), ...,
       (-0.057407  ,  -6.266104 ,  1.6693828 , 19,  0,  16, 63),
       ( 0.56391037, -11.262503 ,  0.31594068,  0,  0, 150, 63),
       ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

Note, that columns 0 to 2 are floats, and columns 3 to 6 are ints.

I want to efficiently convert this array to 2D array of floats. How do I can perform this?

@mozway, that's different, this structured array has different types and applying just arr.view((float, len(arr.dtype.names))) will throw ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged — RomanPerekhrest
– RomanPerekhrest, Commented Sep 6, 2023 at 9:11
@mozway, the 2nd link works with a slice of columns and does not produce the unified converted set of columns — RomanPerekhrest
– RomanPerekhrest, Commented Sep 6, 2023 at 9:18

juanpa.arrivillaga · Accepted Answer · 2023-09-06 09:22:23Z

2

Here is one way, which should be memory efficient at least, and not too slow:

result = np.empty((arr.shape[0], len(arr.dtype.fields)), np.float32)
for i, field in enumerate(arr.dtype.fields):
    result[:, i] = arr[field]

I assumed you wanted np.float32 for your resulting array.

edited Sep 6, 2023 at 9:22

answered Sep 6, 2023 at 9:03

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

hpaulj Over a year ago

Yes most recfunctions use this copy by field approach. The subpackage also has a structured_to_unstructured function, numpy.org/devdocs/user/…

hpaulj · Accepted Answer · 2023-09-06 15:57:55Z

recfunctions has functions to play with recarrays (and by extension structured arrays). It is documented on the main structured array page. It requires special loading:

In [204]: import numpy.lib.recfunctions as rf

In [205]: arr = np.array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
     ...:        (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
     ...:        (-0.84541476, -11.3712845, -0.8726147 , 75, 50,   0,  0), 
     ...:        (-0.057407  ,  -6.266104 ,  1.6693828 , 19,  0,  16, 63),
     ...:        ( 0.56391037, -11.262503 ,  0.31594068,  0,  0, 150, 63),
     ...:        ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
     ...:       dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

In [206]: arr
Out[206]: 
array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
       (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
        ...
       ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

A relatively recent addition to that library is a pair of converter functions:

In [207]: arr1 = rf.structured_to_unstructured(arr)

In [208]: arr1
Out[208]: 
array([[-8.56945932e-01, -6.39972162e+00, -1.54863226e+00,
         3.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-1.18924475e+00, -1.14172087e+01, -2.17719153e-01,
         9.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       ...
       [ 9.11834717e-01, -1.14296665e+01, -3.37240189e-01,
         9.60000000e+01,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00]], dtype=float32)

The tolist approach in another answer works because for structured arrays, the result is a list of tuples, which np.array can parse just as easily as a list of lists. The list of tuples, though, is required if you want to make a structured array. numpy developers have chosen display/parse structured array records as tuples.

In [209]: arr2 = np.array(arr.tolist()
In [211]: arr.tolist()
Out[211]: 
[(-0.8569459319114685, -6.399721622467041, -1.548632264137268, 37, 50, 0, 0),
 (-1.1892447471618652, -11.417208671569824, -0.2177191525697708, 97, 50, 0, 0),
 ... 
 (0.911834716796875, -11.429666519165039, -0.33724018931388855, 96, 0, 0, 0)]

Many of the recfunctions work by creating a target array, and copying data field by field. Since usually the number of records is much larger than the number of fields, this is relatively efficient. I assume structured_to_unstructured acts this way, though I haven't examined its code.

I haven't timed these alternatives.

RomanPerekhrest · Accepted Answer · 2023-09-06 09:24:35Z

In particular a new data type object can be reconstructed with 'names'/'formats' dictionary, where 'names' holds the current field names and 'formats' - the respective dtype format:

arr = np.array(arr.astype(np.dtype({'names': arr.dtype.names, 
                                    'formats':['<f4']*len(arr.dtype.names)})).tolist())

array([[-8.56945932e-01, -6.39972162e+00, -1.54863226e+00,
         3.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-1.18924475e+00, -1.14172087e+01, -2.17719153e-01,
         9.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-8.45414758e-01, -1.13712845e+01, -8.72614682e-01,
         7.50000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-5.74069992e-02, -6.26610422e+00,  1.66938281e+00,
         1.90000000e+01,  0.00000000e+00,  1.60000000e+01,
         6.30000000e+01],
       [ 5.63910365e-01, -1.12625027e+01,  3.15940678e-01,
         0.00000000e+00,  0.00000000e+00,  1.50000000e+02,
         6.30000000e+01],
       [ 9.11834717e-01, -1.14296665e+01, -3.37240189e-01,
         9.60000000e+01,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00]])

Collectives™ on Stack Overflow

Convert numpy structured array to 2d array efficiently

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related