3

I have 2 numpy arrays a and b as below:

a = np.random.randint(0,10,(3,2))
Out[124]: 
array([[0, 2],
       [6, 8],
       [0, 4]])
b = np.random.randint(0,10,(2,2))
Out[125]: 
array([[5, 9],
       [2, 4]])

I want to subtract each row in b from each row in a and the desired output is of shape(3,2,2):

array([[[-5, -7],        [-2, -2]],

       [[ 1, -1],        [ 4,  4]],

       [[-5, -5],        [-2,  0]]])

I can do this using:

print(np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2))

But I need a fully vectorized solution or a built in numpy function to do this.

1
  • What I mean by fully vectorized solution(factorized is a typo before) is I don't want to reference array b by its index like b[i] because the number of rows in this array can change and I want to have a solution which will always output an array of shape (3,len(b),2) Commented Apr 14, 2017 at 13:20

4 Answers 4

5

Just use np.newaxis (which is just an alias for None) to add a singleton dimension to a, and let broadcasting do the rest:

In [45]: a[:, np.newaxis] - b
Out[45]: 
array([[[-5, -7],
        [-2, -2]],

       [[ 1, -1],
        [ 4,  4]],

       [[-5, -5],
        [-2,  0]]])
Sign up to request clarification or add additional context in comments.

3 Comments

this is about 40% faster than my best solution. could you better explain what is happening here? its a bit abstract
This is very memory inefficient for large arrays; I want to subtract a 5000 x 3078 array from a 500 x 3078 array and this would take 500 * 3072 * 5000 * 8 / 1e9 = 61.44 gigabytes.
This really is neat answer. We even can substract vector with list of scalar, resulting in matrix: a = np.array([1,2,3,4,5,6]), b = np.array([1,2,3]), a[:, np.newaxis]-b. Resulting in shape (6,3).
1

I'm not sure what means a fully factorized solution, but may be this will help:

np.append(a, a, axis=1).reshape(3, 2, 2) - b

1 Comment

Thanks for the answer but please see my comments in the question.
1

You can shave a little time off using np.subtract(), and a good bit more using np.concatenate()

import numpy as np
import time

start = time.time()
for i in range(100000):

    a = np.random.randint(0,10,(3,2))
    b = np.random.randint(0,10,(2,2))
    c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)

print time.time() - start

start = time.time()
for i in range(100000):

    a = np.random.randint(0,10,(3,2))
    b = np.random.randint(0,10,(2,2))
    #c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)
    c = np.c_[np.subtract(a,b[0]),np.subtract(a,b[1])].reshape(3,2,2)

print time.time() - start

start = time.time()
for i in range(100000):

    a = np.random.randint(0,10,(3,2))
    b = np.random.randint(0,10,(2,2))
    #c = np.c_[(a - b[0]),(a - b[1])].reshape(3,2,2)
    c = np.concatenate([np.subtract(a,b[0]),np.subtract(a,b[1])],axis=1).reshape(3,2,2)

print time.time() - start

>>>

3.14023900032
3.00368094444
1.16146492958

reference:

confused about numpy.c_ document and sample code

np.c_ is another way of doing array concatenate

2 Comments

Thanks for the answer but please see my comments in the question.
ah I see, that adds a twist. I wasn't sure what you meant by factorized in the original op, will ponder; see if I can conjure something. nonetheless reduced cpu load is always a plus
1

Reading from the doc on broadcasting, it says:

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

they are equal, or
one of them is 1

Back to your case, you want result to be of shape (3, 2, 2), following these rules, you have to play around with your dimensions. Here's now the code to do it:

In [1]: a_ = np.expand_dims(a, axis=0)

In [2]: b_ = np.expand_dims(b, axis=1)

In [3]: c = a_ - b_

In [4]: c
Out[4]: 
array([[[-5, -7],
        [ 1, -1],
        [-5, -5]],

       [[-2, -2],
        [ 4,  4],
        [-2,  0]]])

In [5]: result = c.swapaxes(1, 0)

In [6]: result
Out[6]: 
array([[[-5, -7],
        [-2, -2]],

       [[ 1, -1],
        [ 4,  4]],

       [[-5, -5],
        [-2,  0]]])

In [7]: result.shape
Out[7]: (3, 2, 2)

1 Comment

Does this answer still work if say, A and B had more than two columns?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.