17

I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html. The below performance measures are with kmax=1000

Pure Python implementation, running in CPython: 0.15s

Pure Python implementation, running in Cython: 0.07s

def primes(kmax):
    p = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p.append(n)
            k = k + 1
        n = n + 1
    return p

Pure Python+Numpy implementation, running in CPython: 1.25s

import numpy

def primes(kmax):
    p = numpy.empty(kmax, dtype=int)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Cython implementation using int*: 0.003s

from libc.stdlib cimport malloc, free

def primes(int kmax):
    cdef int n, k, i
    cdef int *p = <int *>malloc(kmax * sizeof(int))
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    free(p)
    return result

The above performs great but looks horrible, as it holds two copies of the data... so I tried reimplementing it:

Cython + Numpy: 1.01s

import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Questions:

  1. why is the numpy array so incredibly slower than a python list, when running on CPython?
  2. what did I do wrong in the Cython+Numpy implementation? cython is obviously NOT treating the numpy array as an int[] as it should.
  3. how do I cast a numpy array to a int*? The below doesn't work

    cdef numpy.nparray a = numpy.zeros(100, dtype=int)
    cdef int * p = <int *>a.data
    
1
  • 1) Your first numpy does not really use numpy in any vectorised way, just stores the information in an array, but is all done by loops. Commented Mar 19, 2014 at 18:43

3 Answers 3

10
cdef DTYPE_t [:] p_view = p

Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    cdef DTYPE_t [:] p_view = p
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p_view[i] != 0:
            i = i + 1
        if i == k:
            p_view[k] = n
            k = k + 1
        n = n + 1
    return p
Sign up to request clarification or add additional context in comments.

6 Comments

I found a more compact solution that doesn't use a view: cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
Also, adding @cython.wraparound(False) gives me a 25% time reduction
@crusaderky The benefit of using a view, is that you could use it in a code block using nogil. Since it is no longer a native python object, the GIL can be released while modifying it.
Actually cython lets me compile just fine with nogil a block of code that uses a numpy array - as long as I correctly cdef'ed it.
@crusaderky Are you able to read and modify the array inside the nogil block also?
|
5

why is the numpy array so incredibly slower than a python list, when running on CPython?

Because you didn't fully type it. Use

cdef np.ndarray[dtype=np.int, ndim=1] p = np.empty(kmax, dtype=DTYPE)

how do I cast a numpy array to a int*?

By using np.intc as the dtype, not np.int (which is a C long). That's

cdef np.ndarray[dtype=int, ndim=1] p = np.empty(kmax, dtype=np.intc)

(But really, use a memoryview, they're much cleaner and the Cython folks want to get rid of the NumPy array syntax in the long run.)

1 Comment

My question 1 is about CPython, not Cython. Why is numpy slower than a pure python list?
1

Best syntax I found so far:

import numpy
cimport numpy
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def primes(int kmax):
    cdef int n, k, i
    cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Note where I used numpy.int32 instead of int. Anything on the left side of a cdef is a C type (thus int = int32 and float = float32), while anything on the RIGHT side of it (or outside of a cdef) is a python type (int = int64 and float = float64)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.