Split a numpy array with several sorted sequences

Question

I have a large numpy array (typically a few thousands of numbers) that is consisted of several sorted sequences,
for example:

arr = [12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11]

I would like to split it into subarrays - each one holds another sequence -

[12, 13, 14], [22, 23, 24, 25, 26], [9, 10, 11]

What is the fastest way to do that?

the sub array [12, 13, 14, 22, 23, 24, 25, 26] is sorted, why do you split it? — yann ziselman
– yann ziselman, Commented Jul 27, 2021 at 12:41

Daweo · Accepted Answer · 2021-07-27 12:43:52Z

I would do it following way

import numpy as np
arr = np.array([12, 13, 14, 22, 23, 24, 25, 26, 9, 10, 11])
splits = np.flatnonzero(np.diff(arr)!=1)
sub_arrs = np.split(arr, splits+1)
print(sub_arrs)

output

[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

Explanation: I create array with differences between adjacent elements using numpy.diff (np.diff(arr)) then process it to get array with Trues where difference is 1 and Falses in every other case (np.diff(arr)!=1) then find indices of Trues in that array using np.flatnonzero (True is treated as 1 and False is treated as 0 in python) finally I use numpy.split to get list of subarrays made from arr at spllited at splits offseted by 1 (note that numpy.diff returns array which is shorter by 1 than its input).

Side note: I would call this finding sub-arrays with consecutive runs, rather than merely sorted as you might split your arr into [[12, 13, 14, 22, 23, 24, 25, 26], [9, 10, 11]] and full-fill requirement that every sub-array is sorted

czeni · Accepted Answer · 2021-07-27 12:49:14Z

First of all, the problem could be really complex, but based on your example I assume that the values in subarrays are increasing by 1.

Here is a one liner solution with plain numpy: np.array_split(a, np.where(np.diff(a) != 1)[0]+1)

Explanation: You can calculate the difference between consecutive values with np.diff.

>>> import numpy as np 
>>> a
array([12, 13, 14, 22, 23, 24, 25, 26,  9, 10, 11])
>>> np.diff(a)
array([  1,   1,   8,   1,   1,   1,   1, -17,   1,   1])

Then, get the indices of the values that represents the last element of the subarrays, that is the values that do no equal 1.

>>> np.where(np.diff(a) != 1)
(array([2, 7]),)

Finally, we add 1 to the boundaries to be able to use np.array_split() correctly to generate the subarrays.

>>> np.where(np.diff(a) != 1)[0]+1
array([3, 8])
>>> np.array_split(a, np.where(np.diff(a) != 1)[0]+1)
[array([12, 13, 14]), array([22, 23, 24, 25, 26]), array([ 9, 10, 11])]

Collectives™ on Stack Overflow

Split a numpy array with several sorted sequences

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related