1

I have two numpy arrays: one containing arbitrary values, and one containing integers larger than 1. The sum of the integers is equal to the length of the first array. Sample:

values = np.array(["a", "b", "c", "d", "e", "f", "g", "h"])
lengths = np.array([1, 3, 2, 2])
len(values) == sum(lengths) # True

I would like to split the first array according to the lengths of the second array, and end up with something like:

output = np.array([["a"], ["b", "c", "d"], ["e", "f"], ["g", "h"]], dtype=object)

It's easy to iterate over the array with a Python loop, but it's also slow when both lists are very large (hundreds of millions of elements). Is there a way to do this operation using native numpy operations, which presumably should be must faster?

2
  • 2
    The 'native' split and split_array also have to loop, taking multiple slices. And they produce a list of arrays, which is a bit different from your object dtype array of lists. This isn't a 'native' operation like reshape. Commented Feb 8, 2023 at 16:00
  • Yes, @hpaulj: the following solution appears to be much faster: cl = np.cumsum(lengths) [values[x[0]:x[1]] for x in zip([0] + cl[:-1].tolist(), cl)]. Commented Feb 8, 2023 at 17:02

1 Answer 1

3

You can use the split method from numpy:

output = np.split(values, np.cumsum(lengths))[:-1]
Sign up to request clarification or add additional context in comments.

1 Comment

Ah, using split with cumsum is obvious in retrospect. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.