cpython vs cython vs numpy array performance

Question

I am doing some performance test on a variant of the prime numbers generator from http://docs.cython.org/src/tutorial/numpy.html. The below performance measures are with kmax=1000

Pure Python implementation, running in CPython: 0.15s

Pure Python implementation, running in Cython: 0.07s

def primes(kmax):
    p = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p.append(n)
            k = k + 1
        n = n + 1
    return p

Pure Python+Numpy implementation, running in CPython: 1.25s

import numpy

def primes(kmax):
    p = numpy.empty(kmax, dtype=int)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Cython implementation using int*: 0.003s

from libc.stdlib cimport malloc, free

def primes(int kmax):
    cdef int n, k, i
    cdef int *p = <int *>malloc(kmax * sizeof(int))
    result = []
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
            result.append(n)
        n = n + 1
    free(p)
    return result

The above performs great but looks horrible, as it holds two copies of the data... so I tried reimplementing it:

Cython + Numpy: 1.01s

import numpy as np
cimport numpy as np
cimport cython

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Questions:

why is the numpy array so incredibly slower than a python list, when running on CPython?
what did I do wrong in the Cython+Numpy implementation? cython is obviously NOT treating the numpy array as an int[] as it should.

how do I cast a numpy array to a int*? The below doesn't work

cdef numpy.nparray a = numpy.zeros(100, dtype=int)
cdef int * p = <int *>a.data

1) Your first numpy does not really use numpy in any vectorised way, just stores the information in an array, but is all done by loops. — askewchan
– askewchan, Commented Mar 19, 2014 at 18:43

M4rtini · Accepted Answer · 2014-03-19 18:42:52Z

10

cdef DTYPE_t [:] p_view = p

Using this instead of p in the calculations. reduced the runtime from 580 ms down to 2.8 ms for me. About the exact same runtime as the implementation using *int. And that's about the max you can expect from this.

DTYPE = np.int
ctypedef np.int_t DTYPE_t

@cython.boundscheck(False)
def primes(DTYPE_t kmax):
    cdef DTYPE_t n, k, i
    cdef np.ndarray p = np.empty(kmax, dtype=DTYPE)
    cdef DTYPE_t [:] p_view = p
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p_view[i] != 0:
            i = i + 1
        if i == k:
            p_view[k] = n
            k = k + 1
        n = n + 1
    return p

answered Mar 19, 2014 at 18:42

M4rtini

13.6k4 gold badges37 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

crusaderky Over a year ago

I found a more compact solution that doesn't use a view: cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)

crusaderky Over a year ago

Also, adding @cython.wraparound(False) gives me a 25% time reduction

M4rtini Over a year ago

@crusaderky The benefit of using a view, is that you could use it in a code block using nogil. Since it is no longer a native python object, the GIL can be released while modifying it.

crusaderky Over a year ago

Actually cython lets me compile just fine with nogil a block of code that uses a numpy array - as long as I correctly cdef'ed it.

M4rtini Over a year ago

@crusaderky Are you able to read and modify the array inside the nogil block also?

|

Fred Foo · Accepted Answer · 2014-03-19 18:47:19Z

5

why is the numpy array so incredibly slower than a python list, when running on CPython?

Because you didn't fully type it. Use

cdef np.ndarray[dtype=np.int, ndim=1] p = np.empty(kmax, dtype=DTYPE)

how do I cast a numpy array to a int*?

By using np.intc as the dtype, not np.int (which is a C long). That's

cdef np.ndarray[dtype=int, ndim=1] p = np.empty(kmax, dtype=np.intc)

(But really, use a memoryview, they're much cleaner and the Cython folks want to get rid of the NumPy array syntax in the long run.)

answered Mar 19, 2014 at 18:47

Fred Foo

365k80 gold badges765 silver badges852 bronze badges

1 Comment

crusaderky Over a year ago

My question 1 is about CPython, not Cython. Why is numpy slower than a pure python list?

crusaderky · Accepted Answer · 2014-03-20 09:54:33Z

1

Best syntax I found so far:

import numpy
cimport numpy
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def primes(int kmax):
    cdef int n, k, i
    cdef numpy.ndarray[int] p = numpy.empty(kmax, dtype=numpy.int32)
    k = 0
    n = 2
    while k < kmax:
        i = 0
        while i < k and n % p[i] != 0:
            i = i + 1
        if i == k:
            p[k] = n
            k = k + 1
        n = n + 1
    return p

Note where I used numpy.int32 instead of int. Anything on the left side of a cdef is a C type (thus int = int32 and float = float32), while anything on the RIGHT side of it (or outside of a cdef) is a python type (int = int64 and float = float64)

answered Mar 20, 2014 at 9:54

crusaderky

2,6903 gold badges22 silver badges28 bronze badges

Collectives™ on Stack Overflow

cpython vs cython vs numpy array performance

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related