2

I am struggling with a slow numpy operation, using python 3.

I have the following operation:

np.sum(np.log(X.T * b + a).T, 1)

where

(30000,1000) = X.shape
(1000,1) = b.shape
(1000,1) = a.shape

My problem is that this operation is pretty slow (around 1.5 seconds), and it is inside a loop, so it is repeated around 100 times, that makes the running time of my code very long.

I am wondering if there is a faster implementation of this function.

Maybe useful fact: X is extremely sparse (only 0.08% of the entries are nonzero), but is a NumPy array.

3
  • What structure do you use to store X? Have you had a look at scipy.sparse.csc_matrix ? Commented Oct 14, 2017 at 14:33
  • It is a numpy array for now. No, I have not look at it.. Commented Oct 14, 2017 at 14:34
  • 1
    Took the liberty to add that piece of info that X is a NumPy array into the post, which was crucial here. Also, people might confuse NumPy sparse matrix with the scipy sparse matrix, so edited title too. That "sparsey" was the best I could conjure up without confusing with the SciPy sparse matrix. Commented Oct 14, 2017 at 17:42

2 Answers 2

2

We can optimize the logarithm operation which seems to be the bottleneck and that being one of the transcendental functions could be sped up with numexpr module and then sum-reduce with NumPy because NumPy does it much better, thus giving us a hybrid one, like so -

import numexpr as ne

def numexpr_app(X, a, b):
    XT = X.T
    return ne.evaluate('log(XT * b + a)').sum(0)

Looking closely at the broadcasting operations : XT * b + a, we see that there are two stages of broadcasting, on which we can optimize further. The intention is to see if that could be reduced to one stage and that seems possible here with some division. This gives us a slightly modified version, shown below -

def numexpr_app2(X, a, b):
    ab = (a/b)
    XT = X.T
    return np.log(b).sum() + ne.evaluate('log(ab + XT)').sum(0)

Runtime test and verification

Original approach -

def numpy_app(X, a, b):
    return np.sum(np.log(X.T * b + a).T, 1)

Timings -

In [111]: # Setup inputs
     ...: density = 0.08/100 # 0.08 % sparse
     ...: m,n = 30000, 1000
     ...: X = scipy.sparse.rand(m,n,density=density,format="csr").toarray()
     ...: a = np.random.rand(n,1)
     ...: b = np.random.rand(n,1)
     ...: 

In [112]: out0 = numpy_app(X, a, b)
     ...: out1 = numexpr_app(X, a, b)
     ...: out2 = numexpr_app2(X, a, b)
     ...: print np.allclose(out0, out1)
     ...: print np.allclose(out0, out2)
     ...: 
True
True

In [114]: %timeit numpy_app(X, a, b)
1 loop, best of 3: 691 ms per loop

In [115]: %timeit numexpr_app(X, a, b)
10 loops, best of 3: 153 ms per loop

In [116]: %timeit numexpr_app2(X, a, b)
10 loops, best of 3: 149 ms per loop

Just to prove the observation stated at the start that log part is the bottleneck with the original NumPy approach, here's the timing on it -

In [44]: %timeit np.log(X.T * b + a)
1 loop, best of 3: 682 ms per loop

On which the improvement was significant -

In [120]: XT = X.T

In [121]: %timeit ne.evaluate('log(XT * b + a)')
10 loops, best of 3: 142 ms per loop
Sign up to request clarification or add additional context in comments.

5 Comments

what if b has zero entries ?
@P.Camilleri Ah you mean b having zeros? Yeah, in that case, numexpr_app would be the way to go.
@Divakar This is super interesting, thank you. I can only use numexpr_app, as b has negative values. But is it possible that I get slightly different values as output, comparing numpy_app and numexpr_app?
@UlderiqueDemoitre How slight is that slightly? What's the abs. max difference/error value between those approaches?
@Divakar my mistake, I was missing a part in the code. Great, you are a life saver.
1

It's a bit unclear why you would do np.sum(your_array.T, axis=1) instead of np.sum(your_array, axis=0).

You can use a scipy sparse matrix: (use column compressed format for X, so that X.T is row compressed, since you multiply by b which has the shape of one row of X.T)

X_sparse = scipy.sparse.csc_matrx(X)

and replace X.T * b by:

X_sparse.T.multiply(b) 

However if a is not sparse it will not help you as much as it could.

These are the speed ups I obtain for this operation:

In [16]: %timeit X_sparse.T.multiply(b)
The slowest run took 10.80 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 374 µs per loop

In [17]: %timeit X.T * b
10 loops, best of 3: 44.5 ms per loop

with:

import numpy as np
from scipy import sparse

X = np.random.randn(30000, 1000)
a = np.random.randn(1000, 1)
b = np.random.randn(1000, 1)
X[X < 3] = 0
print(np.sum(X != 0))
X_sparse = sparse.csc_matrix(X)

1 Comment

Creating a sparse matrix from a dense one takes time. And addition of a turns the result back to dense, and can be expensive.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.