Python: Split NumPy array based on values in the array

Question

I have one big array:

[(1.0, 3.0, 1, 427338.4297000002, 4848489.4332)
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692)
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469) ...,
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592)
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351)
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)]

I want to split this array into multiple arrays based on the 2nd value in the array (3.0, 3.0, 3.0...1.0,1.0,10).

Every time the 2nd value changes, I want a new array, so basically each new array has the same 2nd value. I've looked this up on Stack Overflow and know of the command

np.split(array, number)

but I'm not trying to split the array into a certain number of arrays, but rather by a value. How would I be able to split the array in the way specified above? Any help would be appreciated!

There is also Pandas's groupby: stackoverflow.com/questions/33622888/… — Ciro Santilli OurBigBook.com
– Ciro Santilli OurBigBook.com, Commented Sep 7, 2016 at 7:40

Ashwini Chaudhary · Accepted Answer · 2015-08-08 10:36:13Z

28

You can find the indices where the values differ by using numpy.where and numpy.diff on the first column:

>>> arr = np.array([(1.0, 3.0, 1, 427338.4297000002, 4848489.4332),
 (1.0, 3.0, 2, 427344.7937000003, 4848482.0692),
 (1.0, 3.0, 3, 427346.4297000002, 4848472.7469),
 (1.0, 1.0, 7084, 427345.2709999997, 4848796.592),
 (1.0, 1.0, 7085, 427352.9277999997, 4848790.9351),
 (1.0, 1.0, 7086, 427359.16060000006, 4848787.4332)])
>>> np.split(arr, np.where(np.diff(arr[:,1]))[0]+1)
[array([[  1.00000000e+00,   3.00000000e+00,   1.00000000e+00,
          4.27338430e+05,   4.84848943e+06],
       [  1.00000000e+00,   3.00000000e+00,   2.00000000e+00,
          4.27344794e+05,   4.84848207e+06],
       [  1.00000000e+00,   3.00000000e+00,   3.00000000e+00,
          4.27346430e+05,   4.84847275e+06]]),
 array([[  1.00000000e+00,   1.00000000e+00,   7.08400000e+03,
          4.27345271e+05,   4.84879659e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08500000e+03,
          4.27352928e+05,   4.84879094e+06],
       [  1.00000000e+00,   1.00000000e+00,   7.08600000e+03,
          4.27359161e+05,   4.84878743e+06]])]

Explanation:

Here first we are going to fetch the items in the second 2 column:

>>> arr[:,1]
array([ 3.,  3.,  3.,  1.,  1.,  1.])

Now to find out where the items actually change we can use numpy.diff:

>>> np.diff(arr[:,1])
array([ 0.,  0., -2.,  0.,  0.])

Any thing non-zero means that the item next to it was different, we can use numpy.where to find the indices of non-zero items and then add 1 to it because the actual index of such item is one more than the returned index:

>>> np.where(np.diff(arr[:,1]))[0]+1
array([3])

edited Aug 8, 2015 at 10:36

answered Aug 6, 2015 at 18:25

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

whent1991 Over a year ago

I get an IndexError: Too many indices for array. Do you know what the issue is? Thanks for your help!

Ashwini Chaudhary Over a year ago

@whent1991 Can you post the actual array?

whent1991 Over a year ago

The one I posted is the actual array, but it is a huge array so there's "..." in the middle of the array.

whent1991 Over a year ago

Hi, this solution worked great, but could you explain what the (np.diff(arr[:,1])))[0]+1 does? The syntax confuses me a bit as I am quite new to Python. Thanks for your help!

Ashwini Chaudhary Over a year ago

@whent1991 Added some explanation.

|

Collectives™ on Stack Overflow

Python: Split NumPy array based on values in the array

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related