Numpy: Joining structured arrays?

Question

Input

I have many numpy structured arrays in a list like this example:

import numpy

a1 = numpy.array([(1, 2), (3, 4), (5, 6)], dtype=[('x', int), ('y', int)])

a2 = numpy.array([(7,10), (8,11), (9,12)], dtype=[('z', int), ('w', float)])

arrays = [a1, a2]

Desired Output

What is the correct way to join them all together to create a unified structured array like the following?

desired_result = numpy.array([(1, 2, 7, 10), (3, 4, 8, 11), (5, 6, 9, 12)],
                             dtype=[('x', int), ('y', int), ('z', int), ('w', float)])

Current Approach

This is what I'm currently using, but it is very slow, so I suspect there must be a more efficent way.

from numpy.lib.recfunctions import append_fields

def join_struct_arrays(arrays):
    for array in arrays:
        try:
            result = append_fields(result, array.dtype.names, [array[name] for name in array.dtype.names], usemask=False)
        except NameError:
            result = array

    return result

joris · Accepted Answer · 2011-03-18 18:13:41Z

48

You can also use the function merge_arrays of numpy.lib.recfunctions:

import numpy.lib.recfunctions as rfn
rfn.merge_arrays(arrays, flatten = True, usemask = False)

Out[52]: 
array([(1, 2, 7, 10.0), (3, 4, 8, 11.0), (5, 6, 9, 12.0)], 
     dtype=[('x', '<i4'), ('y', '<i4'), ('z', '<i4'), ('w', '<f8')])

answered Mar 18, 2011 at 18:13

joris

140k37 gold badges257 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jon-Eric Over a year ago

This is more readable and 1.32 times faster than my original solution. Thanks!

Sven Marnach · Accepted Answer · 2011-03-18 20:56:30Z

20

Here is an implementation that should be faster. It converts everything to arrays of numpy.uint8 and does not use any temporaries.

def join_struct_arrays(arrays):
    sizes = numpy.array([a.itemsize for a in arrays])
    offsets = numpy.r_[0, sizes.cumsum()]
    n = len(arrays[0])
    joint = numpy.empty((n, offsets[-1]), dtype=numpy.uint8)
    for a, size, offset in zip(arrays, sizes, offsets):
        joint[:,offset:offset+size] = a.view(numpy.uint8).reshape(n,size)
    dtype = sum((a.dtype.descr for a in arrays), [])
    return joint.ravel().view(dtype)

Edit: Simplified the code and avoided the unnecessary as_strided().

edited Mar 18, 2011 at 20:56

answered Mar 18, 2011 at 17:56

Sven Marnach

608k123 gold badges966 silver badges865 bronze badges

7 Comments

Jon-Eric Over a year ago

This is 166 times faster than my original solution. I would have never come up with that on my own. Thanks!

Sven Marnach Over a year ago

@Jon-Eric: I simplified the code a bit (and threw out as_strided()). I hope this did not affect the performance. Also be sure to have a look at joris' second answer.

Hans Over a year ago

@Jon-Eric: Did you say it was 166 times faster, not 1.66 times faster? Just want to confirm.

Jon-Eric Over a year ago

@Hans Yes, 166 times faster. (When I originally measured.)

Sven Marnach Over a year ago

@Hans This is a rather low-level approach which basically just copies the old arrays as blocks of memory, completely ignoring their structure. It's clear why this can be done rather quickly, but to understand why the original solution is so slow you need to profile the code. It could be for many reasons, including the one you mentioned.

|

joris · Accepted Answer · 2011-03-18 21:13:38Z

8

and yet another way, a little more readable and also a lot faster I think:

def join_struct_arrays(arrays):
    newdtype = []
    for a in arrays:
        descr = []
        for field in a.dtype.names:
            (typ, _) = a.dtype.fields[field]
            descr.append((field, typ))
        newdtype.extend(tuple(descr))
    newrecarray = np.zeros(len(arrays[0]), dtype = newdtype)
    for a in arrays:
        for name in a.dtype.names:
            newrecarray[name] = a[name]
    return newrecarray

EDIT: with the suggestions of Sven it becomes (a little bit slower, but actually pretty readable):

def join_struct_arrays2(arrays):
    newdtype = sum((a.dtype.descr for a in arrays), [])
    newrecarray = np.empty(len(arrays[0]), dtype = newdtype)
    for a in arrays:
        for name in a.dtype.names:
            newrecarray[name] = a[name]
    return newrecarray

edited Mar 18, 2011 at 21:13

answered Mar 18, 2011 at 19:03

joris

140k37 gold badges257 silver badges207 bronze badges

4 Comments

Sven Marnach Over a year ago

Nice, +1! Two suggestions: 1. Use numpy.empty() instead of numpy.zeros() -- it's not necessary to initialise the data. 2. Substitute the first seven lines by the last but one line of my code.

joris Over a year ago

Thanks! That really simplified the code. But on the otherhand, I tested it with %timeit in IPython, and by substituting these 7 lines by your last but one line, it was two times slower. And I also compared it with your solution, and it appeared around 5 times slower than mine. But I guess that when the number of elements in the list of arrays increases, your solution will become better?

Sven Marnach Over a year ago

To get meaningful timings, you'd need to use big arrays. And I would expect your solution to be at least on par with mine as far as performance is concerned. Note that using empty() instead of zeros() should speed things up a bit.

moi Over a year ago

I think it's wrong to use dtype.descr. According to the documentation: > Warning: This attribute exists specifically for PEP3118 compliance, and is not a datatype description compatible with np.dtype.

lfjbb · Accepted Answer · 2021-06-19 12:20:58Z

1

def join_struct_arrays(*arrs):
    dtype = [(name, d[0]) for arr in arrs for name, d in arr.dtype.fields.items()]
    r = np.empty(arrs[0].shape, dtype=dtype)
    for a in arrs:
       for name in a.dtype.names:
           r[name] = a[name]
    return r

answered Jun 19, 2021 at 12:20

lfjbb

111 bronze badge

1 Comment

lfjbb Over a year ago

maybe the for loop can be improved, but it’s the fastest at this moment

Collectives™ on Stack Overflow

Numpy: Joining structured arrays?

Input

Desired Output

Current Approach

4 Answers 4

1 Comment

7 Comments

4 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

Input

Desired Output

Current Approach

4 Answers 4

1 Comment

7 Comments

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related