0

I am learning cython. I wrote a numpy code:

from numpy import *

def set_onsite(n):
    a=linspace(0,n,n+1)
    onsite=zeros([n+1,n+1],float)
    for i in range(0,n+1):
        onsite[i,i]=a[i]*a[i]
    return onsite

and I used %timeit to calculate the time

%timeit ex5.set_onsite(10000)
10 loops, best of 3: 24.2 ms per loop

So I wrote a cython version like this:

import numpy as np
cimport numpy as np
cimport cython
import cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.nonecheck(False)

def set_onsite(np.int_t n):
    cdef np.ndarray[double,ndim=1,mode='c'] a=np.linspace(0,n,n+1)
    cdef np.ndarray[double,ndim=2,mode='c'] onsite=np.empty(n+1,n+1)
    cdef np.int_t i

    for i in range(0,n+1):
        onsite[i,i]=a[i]*a[i]
    return onsite

and this time the result is:

%timeit ex5.set_onsite(10000)
100 loops, best of 3: 18.1 ms per loop

the results are almost the same. I do not satisfy with this result, so I was wondering is there any way to make the code faster than 18.1 ms per loop?

2
  • You are still making calls to numpy functions, linspace, empty, range and indexing. To get real speed you need use nditer to iterate, including generating a, and the initializing onsite array. Commented Nov 30, 2014 at 6:57
  • Your cython code doesn't produce the same thing does it? You initial onsite with empty, so the off diagonal values can be anything. Commented Nov 30, 2014 at 22:23

1 Answer 1

4

Cython is probably not the right tool to speed up your computations here.

The first thing you should keep in mind is that vectorizing can give you a huge speed boost. The idea is to convert explicit for-loops to operations on whole vectors.

This function does the same, but avoids the for loop:

def set_onsite_vec(n):
    a = np.linspace(0,n,n+1)
    diag = a*a
    return np.diag(diag)

%timeit set_onsite(100) -> 10000 loops, best of 3: 39 µs per loop %timeit set_onsite_vec(100) -> `10000 loops, best of 3: 16.3 µs per loop``

This gives you quite a boost, but you will notice that for larger arrays this gain will decrease.

The problem is that you are creating a huge matrix (O(n^2) entries) - this matrix has to be allocated and initialized, which is the bottleneck.

Note that the matrix only has only non-zero elements at the diagonal. You can use that information and create a sparse matrix instead.

from scipy.sparse import diags

def set_onsite_sparse(n):
    a = np.linspace(0,n,n+1)
    diag = a*a
    return diags([diag], [0])

%timeit set_onsite_sparse(10000) -> 1000 loops, best of 3: 119 µs per loop

Not creating the dense matrix gives you a huge speed boost.

Of course you will only use that improvement if you do some other sparse manipulations before converting back to a dense matrix.

In short: You probably cannot improve significantly using Cython, yet using sparse linear algebra could give you a huge performance boost. But this depends on how you want to use this function in your program.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for pointing it out. I think I might need to know something about vectorizing my code
'vectorizing' in this context means learning which numpy operations operate (with compiled code) on whole vectors (e.g. a), and how they can be combined to produce 2d and larger arrays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.