0

I have a numpy array like ids = ([0,0,0,1,1,2,2,2,2,4,5,5,5]) and some other numpy arrays (say a and b) of the same length. I want to carry out some independent operations using slices of these arrays, with the slices defined as indexes that share the same (contiguous) set of ids. I.e. I want to define a set of slices like

slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
...

so that I can call a function f(a[slice_n],b[slice_n]) for each n in parallel. How do I construct the slices in numpy? If it helps, in R I would do it with something like tapply.

10
  • Don't think creating slices would be a good idea when trying to use vectorization with NumPy, because essentially you would be running the method(s) sequentially. If you could specify the operation that you would to like use, there might be a better solution. Commented Feb 23, 2017 at 13:51
  • 1
    Is this a syntax question? Creating a slice for later use there are essentially two ways. Using slice objects or tuples of slice objects for multidimensional slicing (and Ellipsis for '...') or using np.s_. Commented Feb 23, 2017 at 14:07
  • @Divakar it's the function f(x) that takes time. I imagine the slicing would be quick. Commented Feb 23, 2017 at 14:21
  • @PaulPanzer it's a question of how to create slices that cover contiguous runs of the same number in a numpy int array Commented Feb 23, 2017 at 14:22
  • @user2667066 And that would happen because those apply functions aren't working in parallel on the sliced data, at least the NumPy based apply funcs. Commented Feb 23, 2017 at 14:26

4 Answers 4

1

to get your split points:

spl=np.r_[0, np.where(np.nonzero(np.diff(ids)))[0] + 1, ids.size]

then a list of slices

slices=[slice(i,j) for i,j in zip(spl[:-1].flat, spl[1:].flat)]

or split your other arrays

a_spl=np.split(a,spl[1:-1])

EDIT: since idx is sorted and in order, you can either do unique above or do a boolean slicing (if you have the memory)

slices = list(np.unique(ids)[:,None] == ids[None,:])
Sign up to request clarification or add additional context in comments.

Comments

1

I'm not sure I understand your question, perhaps you intended

slice_0 = 0:3
slice_1 = 3:5
slice_2 = 5:9
slice_3 = 9:10
slice_4 = 10:13

If this is the case, you can use NumPy's unique:

_, idx, count = numpy.unique(ids, return_index=True, return_counts=True)

The lower limit of the slices is idx, the upper limit is idx + count.

4 Comments

Yes, typo, sorry. Will correct in the original question. I means 0:3, 3:5, etc.
Given that the ids are already sorted and in order, is there something more efficient than numpy.unique? What's used in bumpy for e.g. run length encoding?
You didn't specify that they are sorted and in order :) In that case unique is actually probably better than the diff solutions below
If your point is to parallelize f, which is the slow part of the computation, I doubt that using unique on a sorted list will lead to a noticeable loss of efficiency.
0

A way to do that :

In [12]: arrays=vstack((a,b))  

In [13]: arrays
Out[13]: 
array([[4, 1, 4, 2, 5, 7, 1, 5, 9],
       [8, 1, 1, 1, 9, 3, 0, 3, 1]])

In [14]: subarrays=np.split(arrays,[3,5],axis=1)

In [15]: subarrays
Out[15]: 
[array([[4, 1, 4],
        [8, 1, 1]]), 
 array([[2, 5],
        [1, 9]]), 
 array([[7, 1, 5, 9],
        [3, 0, 3, 1]])]

In [16]: [multiply(a,b) for (a,b) in subarrays]
Out[16]: [array([32,  1,  4]), array([ 2, 45]), array([21,  0, 15,  9])]

Comments

0

If you want to chop up an array into chunks along an axis the simplest way is np.split:

>>> a = np.arange(10)
>>> split_points = (2,5,7)
>>> np.split(a, split_points)
[array([0, 1]), array([2, 3, 4]), array([5, 6]), array([7, 8, 9])]

If you want even splitting you can use np.arange for split_points.

To create split points from an id array use split_points = np.where(np.diff(ids))[0] + 1

If your id array is sorted and you also have the ids without repeats then split_points = np.searchsorted(ids, ids_wor)[1:] might be faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.