2

I have big structured numpy array like this:

array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
       (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
       (-0.84541476, -11.3712845, -0.8726147 , 75, 50,   0,  0), ...,
       (-0.057407  ,  -6.266104 ,  1.6693828 , 19,  0,  16, 63),
       ( 0.56391037, -11.262503 ,  0.31594068,  0,  0, 150, 63),
       ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

Note, that columns 0 to 2 are floats, and columns 3 to 6 are ints.

I want to efficiently convert this array to 2D array of floats. How do I can perform this?

4
  • post how should look the final array Commented Sep 6, 2023 at 8:57
  • @mozway, that's different, this structured array has different types and applying just arr.view((float, len(arr.dtype.names))) will throw ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged Commented Sep 6, 2023 at 9:11
  • @RomanPerekhrest there was a second link for mixed types Commented Sep 6, 2023 at 9:12
  • @mozway, the 2nd link works with a slice of columns and does not produce the unified converted set of columns Commented Sep 6, 2023 at 9:18

3 Answers 3

2

Here is one way, which should be memory efficient at least, and not too slow:

result = np.empty((arr.shape[0], len(arr.dtype.fields)), np.float32)
for i, field in enumerate(arr.dtype.fields):
    result[:, i] = arr[field]

I assumed you wanted np.float32 for your resulting array.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes most recfunctions use this copy by field approach. The subpackage also has a structured_to_unstructured function, numpy.org/devdocs/user/…
2

recfunctions has functions to play with recarrays (and by extension structured arrays). It is documented on the main structured array page. It requires special loading:

In [204]: import numpy.lib.recfunctions as rf

In [205]: arr = np.array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
     ...:        (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
     ...:        (-0.84541476, -11.3712845, -0.8726147 , 75, 50,   0,  0), 
     ...:        (-0.057407  ,  -6.266104 ,  1.6693828 , 19,  0,  16, 63),
     ...:        ( 0.56391037, -11.262503 ,  0.31594068,  0,  0, 150, 63),
     ...:        ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
     ...:       dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

In [206]: arr
Out[206]: 
array([(-0.85694593,  -6.3997216, -1.5486323 , 37, 50,   0,  0),
       (-1.1892447 , -11.417209 , -0.21771915, 97, 50,   0,  0),
        ...
       ( 0.9118347 , -11.4296665, -0.3372402 , 96,  0,   0,  0)],
      dtype=[('x', '<f4'), ('y', '<f4'), ('z', '<f4'), ('intensity', 'u1'), ('timestamp', 'u1'), ('m', 'u1'), ('_', 'u1')])

A relatively recent addition to that library is a pair of converter functions:

In [207]: arr1 = rf.structured_to_unstructured(arr)

In [208]: arr1
Out[208]: 
array([[-8.56945932e-01, -6.39972162e+00, -1.54863226e+00,
         3.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-1.18924475e+00, -1.14172087e+01, -2.17719153e-01,
         9.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       ...
       [ 9.11834717e-01, -1.14296665e+01, -3.37240189e-01,
         9.60000000e+01,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00]], dtype=float32)

The tolist approach in another answer works because for structured arrays, the result is a list of tuples, which np.array can parse just as easily as a list of lists. The list of tuples, though, is required if you want to make a structured array. numpy developers have chosen display/parse structured array records as tuples.

In [209]: arr2 = np.array(arr.tolist()
In [211]: arr.tolist()
Out[211]: 
[(-0.8569459319114685, -6.399721622467041, -1.548632264137268, 37, 50, 0, 0),
 (-1.1892447471618652, -11.417208671569824, -0.2177191525697708, 97, 50, 0, 0),
 ... 
 (0.911834716796875, -11.429666519165039, -0.33724018931388855, 96, 0, 0, 0)]

Many of the recfunctions work by creating a target array, and copying data field by field. Since usually the number of records is much larger than the number of fields, this is relatively efficient. I assume structured_to_unstructured acts this way, though I haven't examined its code.

I haven't timed these alternatives.

Comments

0

In particular a new data type object can be reconstructed with 'names'/'formats' dictionary, where 'names' holds the current field names and 'formats' - the respective dtype format:

arr = np.array(arr.astype(np.dtype({'names': arr.dtype.names, 
                                    'formats':['<f4']*len(arr.dtype.names)})).tolist())

array([[-8.56945932e-01, -6.39972162e+00, -1.54863226e+00,
         3.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-1.18924475e+00, -1.14172087e+01, -2.17719153e-01,
         9.70000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-8.45414758e-01, -1.13712845e+01, -8.72614682e-01,
         7.50000000e+01,  5.00000000e+01,  0.00000000e+00,
         0.00000000e+00],
       [-5.74069992e-02, -6.26610422e+00,  1.66938281e+00,
         1.90000000e+01,  0.00000000e+00,  1.60000000e+01,
         6.30000000e+01],
       [ 5.63910365e-01, -1.12625027e+01,  3.15940678e-01,
         0.00000000e+00,  0.00000000e+00,  1.50000000e+02,
         6.30000000e+01],
       [ 9.11834717e-01, -1.14296665e+01, -3.37240189e-01,
         9.60000000e+01,  0.00000000e+00,  0.00000000e+00,
         0.00000000e+00]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.