1

I have a large numpy array. Is there a way to subtract each element with the elements below it, and store the result in a new list/array, without using a loop.

A simple example of what I mean:

a = numpy.array([4,3,2,1])

result = [4-3, 4-2, 4-1, 3-2, 3-1, 2-1] = [1, 2, 3, 1, 2 ,1]

Note that the 'real' array I am working with doesn't contain numbers in sequence. This is just to make the example simple.

I know the result should have (n-1)! elements, where n is the size of the array.

Is there a way to do this without using a loop, but by repeating the array in a 'smart' way?

Thanks!

2
  • Why (n-1)! elements? It looks to me like it should be n choose 2. Commented Mar 24, 2017 at 20:29
  • Yes, you are absolutely right! it is nC2 Commented Mar 24, 2017 at 20:33

4 Answers 4

3
temp = a[:, None] - a
result = temp[np.triu_indices(len(a), k=1)]

Perform all pairwise subtractions to produce temp, including subtracting elements from themselves and subtracting earlier elements from later elements, then use triu_indices to select the results we want. (a[:, None] adds an extra length-1 axis to a.)

Note that almost all of the runtime is spent constructing result from temp (because triu_indices is slow and using indices to select the upper triangle of an array is slow). If you can use temp directly, you can save a lot of time:

In [13]: a = numpy.arange(2000)

In [14]: %%timeit
   ....: temp = a[:, None] - a
   ....: 
100 loops, best of 3: 6.99 ms per loop

In [15]: %%timeit
   ....: temp = a[:, None] - a
   ....: result = temp[numpy.triu_indices(len(a), k=1)]
   ....: 
10 loops, best of 3: 51.7 ms per loop
Sign up to request clarification or add additional context in comments.

2 Comments

Elegant ! Can you give a reference for this a[:,None] indexing you are using.
@SomeGuy: Added.
2

Here's a masking based approach for the extraction after broadcasted subtractions and for the mask creation we are again making use of broadcasting (double broadcasting powered so to speak) -

r = np.arange(a.size)
out = (a[:, None] - a)[r[:,None] < r]

Runtime test

Vectorized approaches -

# @user2357112's solution
def pairwise_diff_triu_indices_based(a):
    return (a[:, None] - a)[np.triu_indices(len(a), k=1)]

# Proposed in this post
def pairwise_diff_masking_based(a):
    r = np.arange(a.size)
    return (a[:, None] - a)[r[:,None] < r]

Timings -

In [109]: a = np.arange(2000)

In [110]: %timeit pairwise_diff_triu_indices_based(a)
10 loops, best of 3: 36.1 ms per loop

In [111]: %timeit pairwise_diff_masking_based(a)
100 loops, best of 3: 11.8 ms per loop

Closer look at involved performance parameters

Let's dig deep a bit through the timings on this setup to study how much mask based approach helps. Now, for comparison there are two parts - Mask creation vs. indices creation and Mask based boolean indexing vs. integer based indexing.

How much mask creation helps?

In [37]: r = np.arange(a.size)

In [38]: %timeit np.arange(a.size)
1000000 loops, best of 3: 1.88 µs per loop

In [39]: %timeit r[:,None] < r
100 loops, best of 3: 3 ms per loop

In [40]: %timeit np.triu_indices(len(a), k=1)
100 loops, best of 3: 14.7 ms per loop

About 5x improvement on mask creation over index setup.

How much boolean indexing helps against integer based indexing?

In [41]: mask = r[:,None] < r

In [42]: idx = np.triu_indices(len(a), k=1)

In [43]: subs = a[:, None] - a

In [44]: %timeit subs[mask]
100 loops, best of 3: 4.15 ms per loop

In [45]: %timeit subs[idx]
100 loops, best of 3: 10.9 ms per loop

About 2.5x improvement here.

2 Comments

That's a much faster way to extract the upper triangle.
@user2357112 Yup! Added a bit of closer look on it.
1
a = [4, 3, 2, 1]
differences = ((x - y) for i, x in enumerate(a) for y in a[i+1:])
for diff in differences:
  # do something with difference.
  pass

4 Comments

yes, but list comprehension uses loops! My array is too large!
In NumPy, "without using a loop" includes list comprehensions and other constructs that perform iteration in Python. The goal is to get all the iteration into C loops over unboxed data.
Is there a way to do this by repeating the array in a smart way so that in the end all you need to do is subtract 1 array from another?
Well, I changed it to a generator comprehension which ought to handle enormous lists. You can iterate over that and pull results one at a time. This is definitely how I'd write it, but I'm no numpy expert.
0

Check out itertools.combinations:

from itertools import combinations

l = [4, 3, 2, 1]

result = []
for n1, n2 in combinations(l, 2):
    result.append(n1 - n2)

print result

Results in:

[1, 2, 3, 1, 2, 1]

combinations returns a generator, so this is good for very large lists :)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.