3

Suppose I have two NumPy arrays

x = [[1, 2, 8],
     [2, 9, 1],
     [3, 8, 9],
     [4, 3, 5],
     [5, 2, 3],
     [6, 4, 7],
     [7, 2, 3],
     [8, 2, 2],
     [9, 5, 3],
     [10, 2, 3],
     [11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0] 

Note: (values in x are not sorted in any way. I chose this example to better illustrate the example) (These are just two examples of x and y. values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y)

I want to efficiently split the array x into sub-arrays according to the values in y.

My desired outputs would be

z_0 = [[1, 2, 8],
       [2, 9, 1],
       [4, 3, 5],
       [10, 2, 3],
       [11, 2, 4]]
z_1 = [[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7],]
z_2 = [[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]]

Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?

Note: This question is the unsorted version of this question: Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

2
  • 1
    Can you say in words how the desired output relates to the sequnce of numbers in y? Commented Mar 19, 2021 at 13:18
  • Imagine that x is a point-cloud and y is the label of each point in x according to a clustering algorithm. z would be all the clustered sub-point-clouds of the original point cloud x Commented Mar 19, 2021 at 13:21

3 Answers 3

3

One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x. For example:

z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]

Output

array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]])
array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]])
array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])

If you want to be more generic and support different sets of numbers in y, you could use a comprehension to produce a list of arrays e.g.

z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]

Output:

[array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]]),
 array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]]),
 array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])]

If y is also an np.array and the same length as x you can simplify this to use boolean indexing:

z = [x[y==m] for m in set(y)]

Output is the same as above.

Sign up to request clarification or add additional context in comments.

4 Comments

I should have added that 'y' can contain arbitrarily many different values. They don't have to be limited to two
So what would be your expected output if there were 20 different values in y? 20 different variables? Or a list with 20 entries?
I also should have added that there are as many 3-dimensional values in x as values in y
@danielhe see my edit, that may be more useful
1

Just use list comprehension and boolean indexing

x = np.array(x)
y = np.array(y)

z = [x[y == i] for i in range(y.max() + 1)]

z
Out[]: 
[array([[ 1,  2,  8],
        [ 2,  9,  1],
        [ 4,  3,  5],
        [10,  2,  3],
        [11,  2,  4]]),
 array([[3, 8, 9],
        [5, 2, 3],
        [6, 4, 7]]),
 array([[7, 2, 3],
        [8, 2, 2],
        [9, 5, 3]])]

Comments

0

Slight variation.

from operator import itemgetter
label = itemgetter(1)

Associate the implied information with the label ... (index,label)

y1 = [thing for thing in enumerate(y)]

Sort on the label

y1.sort(key=label)

Group by label and construct the results

import itertools
d = {}
for key,group in itertools.groupby(y1,label):
    d[f'z{key}'] = [x[i] for i,k in group]

Pandas solution:

>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z       
                                                points
cat
0    [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1                    [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2                    [[7, 2, 3], [8, 2, 2], [9, 5, 3]]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.