Given two numpy arrays, how to split one into an array of lists based on the second

Question

I have two numpy arrays: one containing arbitrary values, and one containing integers larger than 1. The sum of the integers is equal to the length of the first array. Sample:

values = np.array(["a", "b", "c", "d", "e", "f", "g", "h"])
lengths = np.array([1, 3, 2, 2])
len(values) == sum(lengths) # True

I would like to split the first array according to the lengths of the second array, and end up with something like:

output = np.array([["a"], ["b", "c", "d"], ["e", "f"], ["g", "h"]], dtype=object)

It's easy to iterate over the array with a Python loop, but it's also slow when both lists are very large (hundreds of millions of elements). Is there a way to do this operation using native numpy operations, which presumably should be must faster?

The 'native' split and split_array also have to loop, taking multiple slices. And they produce a list of arrays, which is a bit different from your object dtype array of lists. This isn't a 'native' operation like reshape. — hpaulj
– hpaulj, Commented Feb 8, 2023 at 16:00
Yes, @hpaulj: the following solution appears to be much faster: cl = np.cumsum(lengths) [values[x[0]:x[1]] for x in zip([0] + cl[:-1].tolist(), cl)]. — PaulS
– PaulS, Commented Feb 8, 2023 at 17:02

luca · Accepted Answer · 2023-02-08 15:33:54Z

3

You can use the split method from numpy:

output = np.split(values, np.cumsum(lengths))[:-1]

answered Feb 8, 2023 at 15:33

luca

1466 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ted Over a year ago

Ah, using split with cumsum is obvious in retrospect. Thanks!

Collectives™ on Stack Overflow

Given two numpy arrays, how to split one into an array of lists based on the second

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related