Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

Question

Suppose I have two NumPy arrays

x = [[1, 2, 8],
     [2, 9, 1],
     [3, 8, 9],
     [4, 3, 5],
     [5, 2, 3],
     [6, 4, 7],
     [7, 2, 3],
     [8, 2, 2],
     [9, 5, 3],
     [10, 2, 3],
     [11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0]

Note: (values in x are not sorted in any way. I chose this example to better illustrate the example) (These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)

I want to efficiently split the array x into sub-arrays according to the values in y.

My desired outputs would be

z_0 = [[1, 2, 8],
       [2, 9, 1],
       [4, 3, 5],
       [10, 2, 3],
       [11, 2, 4]]
z_1 = [[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7],]
z_2 = [[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]]

Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?

Note: This question is the unsorted version of this question: Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

Can you say in words how the desired output relates to the sequnce of numbers in y? — wwii
– wwii, Commented Mar 19, 2021 at 13:18
Imagine that x is a point-cloud and y is the label of each point in x according to a clustering algorithm. z would be all the clustered sub-point-clouds of the original point cloud x — danielhe
– danielhe, Commented Mar 19, 2021 at 13:21

Nick · Accepted Answer · 2021-03-19 20:51:17Z

3

One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:

z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]

Output

array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]])
array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]])
array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])

If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.

z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]

Output:

[array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]]),
 array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]]),
 array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])]

If y is also an np.array and the same length as x you can simplify this to use boolean indexing:

z = [x[y==m] for m in set(y)]

Output is the same as above.

edited Mar 19, 2021 at 20:51

answered Mar 19, 2021 at 13:21

Nick

147k23 gold badges67 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

danielhe Over a year ago

I should have added that 'y' can contain arbitrarily many different values. They don't have to be limited to two

Nick Over a year ago

So what would be your expected output if there were 20 different values in y? 20 different variables? Or a list with 20 entries?

danielhe Over a year ago

I also should have added that there are as many 3-dimensional values in x as values in y

Nick Over a year ago

@danielhe see my edit, that may be more useful

Daniel F · Accepted Answer · 2021-03-19 13:26:50Z

1

Just use list comprehension and boolean indexing

x = np.array(x)
y = np.array(y)

z = [x[y == i] for i in range(y.max() + 1)]

z
Out[]: 
[array([[ 1,  2,  8],
        [ 2,  9,  1],
        [ 4,  3,  5],
        [10,  2,  3],
        [11,  2,  4]]),
 array([[3, 8, 9],
        [5, 2, 3],
        [6, 4, 7]]),
 array([[7, 2, 3],
        [8, 2, 2],
        [9, 5, 3]])]

answered Mar 19, 2021 at 13:26

Daniel F

14.5k2 gold badges34 silver badges59 bronze badges

Comments

wwii · Accepted Answer · 2021-03-19 15:09:51Z

0

Slight variation.

from operator import itemgetter
label = itemgetter(1)

Associate the implied information with the label ... (index,label)

y1 = [thing for thing in enumerate(y)]

Sort on the label

y1.sort(key=label)

Group by label and construct the results

import itertools
d = {}
for key,group in itertools.groupby(y1,label):
    d[f'z{key}'] = [x[i] for i,k in group]

Pandas solution:

>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z       
                                                points
cat
0    [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1                    [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2                    [[7, 2, 3], [8, 2, 2], [9, 5, 3]]

edited Mar 19, 2021 at 15:09

answered Mar 19, 2021 at 13:52

wwii

23.9k7 gold badges42 silver badges80 bronze badges

Collectives™ on Stack Overflow

Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related