1

I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:

arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]

I would like to split it into subarrays - each one holds another sequence -

[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]

What is the fastest way to do that?

1
  • the sub array [12, 13, 14, 22, 23, 24, 25, 26] is sorted, why do you split it? Commented Jul 27, 2021 at 12:41

2 Answers 2

1

I would do it following way

import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)

output

[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).

Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted

Sign up to request clarification or add additional context in comments.

Comments

1

First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.

Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)

Explanation: You can calculate the difference between consecutive values with np.diff.

>>> import numpy as np 
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26,  9, 10, 11])
>>> np.diff(a)
array([  1,   1,   8,   1,   1,   1,   1, -17,   1,   1])

Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.

>>> np.where(np.diff(a) != 1)
(array([2, 7]),)

Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.

>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.