Optimizing an operation with numpy sparsey array

Question

I am struggling with a slow numpy operation, using python 3.

I have the following operation:

np.sum(np.log(X.T * b + a).T, 1)

where

(30000,1000) = X.shape
(1000,1) = b.shape
(1000,1) = a.shape

My problem is that this operation is pretty slow (around 1.5 seconds), and it is inside a loop, so it is repeated around 100 times, that makes the running time of my code very long.

I am wondering if there is a faster implementation of this function.

Maybe useful fact: X is extremely sparse (only 0.08% of the entries are nonzero), but is a NumPy array.

What structure do you use to store X? Have you had a look at scipy.sparse.csc_matrix ? — P. Camilleri
– P. Camilleri, Commented Oct 14, 2017 at 14:33
Took the liberty to add that piece of info that X is a NumPy array into the post, which was crucial here. Also, people might confuse NumPy sparse matrix with the scipy sparse matrix, so edited title too. That "sparsey" was the best I could conjure up without confusing with the SciPy sparse matrix. — Divakar
– Divakar, Commented Oct 14, 2017 at 17:42

Divakar · Accepted Answer · 2017-10-14 16:34:45Z

2

We can optimize the logarithm operation which seems to be the bottleneck and that being one of the transcendental functions could be sped up with numexpr module and then sum-reduce with NumPy because NumPy does it much better, thus giving us a hybrid one, like so -

import numexpr as ne

def numexpr_app(X, a, b):
    XT = X.T
    return ne.evaluate('log(XT * b + a)').sum(0)

Looking closely at the broadcasting operations : XT * b + a, we see that there are two stages of broadcasting, on which we can optimize further. The intention is to see if that could be reduced to one stage and that seems possible here with some division. This gives us a slightly modified version, shown below -

def numexpr_app2(X, a, b):
    ab = (a/b)
    XT = X.T
    return np.log(b).sum() + ne.evaluate('log(ab + XT)').sum(0)

Runtime test and verification

Original approach -

def numpy_app(X, a, b):
    return np.sum(np.log(X.T * b + a).T, 1)

Timings -

In [111]: # Setup inputs
     ...: density = 0.08/100 # 0.08 % sparse
     ...: m,n = 30000, 1000
     ...: X = scipy.sparse.rand(m,n,density=density,format="csr").toarray()
     ...: a = np.random.rand(n,1)
     ...: b = np.random.rand(n,1)
     ...: 

In [112]: out0 = numpy_app(X, a, b)
     ...: out1 = numexpr_app(X, a, b)
     ...: out2 = numexpr_app2(X, a, b)
     ...: print np.allclose(out0, out1)
     ...: print np.allclose(out0, out2)
     ...: 
True
True

In [114]: %timeit numpy_app(X, a, b)
1 loop, best of 3: 691 ms per loop

In [115]: %timeit numexpr_app(X, a, b)
10 loops, best of 3: 153 ms per loop

In [116]: %timeit numexpr_app2(X, a, b)
10 loops, best of 3: 149 ms per loop

Just to prove the observation stated at the start that log part is the bottleneck with the original NumPy approach, here's the timing on it -

In [44]: %timeit np.log(X.T * b + a)
1 loop, best of 3: 682 ms per loop

On which the improvement was significant -

In [120]: XT = X.T

In [121]: %timeit ne.evaluate('log(XT * b + a)')
10 loops, best of 3: 142 ms per loop

edited Oct 14, 2017 at 16:34

answered Oct 14, 2017 at 15:55

Divakar

222k19 gold badges273 silver badges374 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

P. Camilleri Over a year ago

what if b has zero entries ?

Divakar Over a year ago

@P.Camilleri Ah you mean b having zeros? Yeah, in that case, numexpr_app would be the way to go.

Ulderique Demoitre Over a year ago

@Divakar This is super interesting, thank you. I can only use numexpr_app, as b has negative values. But is it possible that I get slightly different values as output, comparing numpy_app and numexpr_app?

Divakar Over a year ago

@UlderiqueDemoitre How slight is that slightly? What's the abs. max difference/error value between those approaches?

Ulderique Demoitre Over a year ago

@Divakar my mistake, I was missing a part in the code. Great, you are a life saver.

P. Camilleri · Accepted Answer · 2017-10-14 14:42:47Z

1

It's a bit unclear why you would do np.sum(your_array.T, axis=1) instead of np.sum(your_array, axis=0).

You can use a scipy sparse matrix: (use column compressed format for X, so that X.T is row compressed, since you multiply by b which has the shape of one row of X.T)

X_sparse = scipy.sparse.csc_matrx(X)

and replace X.T * b by:

X_sparse.T.multiply(b)

However if a is not sparse it will not help you as much as it could.

These are the speed ups I obtain for this operation:

In [16]: %timeit X_sparse.T.multiply(b)
The slowest run took 10.80 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 374 µs per loop

In [17]: %timeit X.T * b
10 loops, best of 3: 44.5 ms per loop

with:

import numpy as np
from scipy import sparse

X = np.random.randn(30000, 1000)
a = np.random.randn(1000, 1)
b = np.random.randn(1000, 1)
X[X < 3] = 0
print(np.sum(X != 0))
X_sparse = sparse.csc_matrix(X)

answered Oct 14, 2017 at 14:42

P. Camilleri

13.3k10 gold badges49 silver badges85 bronze badges

1 Comment

hpaulj Over a year ago

Creating a sparse matrix from a dense one takes time. And addition of a turns the result back to dense, and can be expensive.

Collectives™ on Stack Overflow

Optimizing an operation with numpy sparsey array

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related