Performance of LLVM-Compiler on native C code vs Python+Numba

Question

I recently did some tests on performance optimization in Python. One part was doing a benchmark on Monte-Carlo Pi calculation using SWIG and compile a library to import in Python. The other solution was using Numba. Now I totally wonder why the native C solution is worse than Numba even if LLVM compiler is used for both. So I'm wondering if I'm doing something wrong.

Runtime on my Laptop

native C module: 7.09 s
Python+Numba:    2.75 s

Native C code

#include "swigtest.h"
#include <time.h>
#include <stdlib.h>
#include <stdio.h>

float monte_carlo_pi(long nsamples)
{
    int accGlob=0;
    int accLoc=0;
    int i,ns;
    float x,y;
    float res;
    float iRMX=1.0/(float) RAND_MAX;
    
    srand(time(NULL));
    
    for(i=0;i<nsamples;i++)
    {
      x = (float)rand()*iRMX;
      y = (float)rand()*iRMX;

      if((x*x + y*y) < 1.0) { acc += 1;}      
    }    
      
    res = 4.0 * (float) acc / (float) nsamples;
      
    printf("cres = %.5f\n",res);
    
    return res;
}

swigtest.i

%module swigtest

%{
#define SWIG_FILE_WITH_INIT
#include "swigtest.h"
%}

float monte_carlo_pi(long nsamples);

Compiler call

clang.exe swigtest.c swigtest_wrap.c -Ofast -o _swigtest.pyd -I C:\python37\include -shared -L c:\python37\libs -g0 -mtune=intel -msse4.2 -mmmx

testswig.py

from swigtest import monte_carlo_pi
import time
import os

start = time.time()
   
pi = monte_carlo_pi(250000000)

print("pi: %.5f" % pi)
print("tm:",time.time()-start)

Python version with Numba

from numba import jit
import random
import time

start = time.time()

@jit(nopython=True,cache=True,fastmath=True)
def monte_carlo_pi(nsamples: int)-> float:
    acc:int = 0
    for i in range(nsamples):
        x:float = random.random()
        y:float = random.random()
        if (x * x + y * y) < 1.0: acc += 1
        
    return 4.0 * acc / nsamples
    
pi = monte_carlo_pi(250000000)

print("pi:",pi)
print("tm:",time.time()-start)

The question is, where the time is spent. I would not be surprised, this happens in random number generation. Using different random number generators will lead to different timings — ead
– ead, Commented Feb 16, 2021 at 8:41
I also thougth that rand might be the problem. On the other hand it is a function availabe for decades, so should be optimized as much as possible. — Michael Hecht
– Michael Hecht, Commented Feb 16, 2021 at 8:45
Did you run a profiler? Is rand optimized for speed or „randomness“? — ead
– ead, Commented Feb 16, 2021 at 8:52
No. I will check this, but nevertheless it can be expected that rand consumes the time. Is Numba working with another random generator? Because the other C code is obviously more or less optimal ... — Michael Hecht
– Michael Hecht, Commented Feb 16, 2021 at 9:27
Apart from the random number generator it would also make sense to compare with the same compiler settings (march=native, O3, Ofast) and using the same datatypes (double and int64). Have a look at monte_carlo_pi.inspect_types() there you can see which datatypes are used in the Numba implementation. — max9111
– max9111, Commented Feb 16, 2021 at 9:33

Michael Hecht · Accepted Answer · 2021-02-16 11:12:57Z

2

Summary up to now:

The rand() function seems to consume most of the time. Using a deterministic approach like this

...
ns     = (long) sqrt((double)nsamples)+1;
dx     = 1./sqrt((double)nsamples);
dy     = dx;
...
for(i=0;i<ns;i++)
          for(k=0;k<ns;k++)
          {
            x = i*dx;
            y = k*dy;

            if((x*x + y*y) < 1.0) { accLoc += 1;}      
          }  
...

instead of rand() results in an execution tim of only 0.04 s! Obviously Numba uses another much more efficient random function.

answered Feb 16, 2021 at 11:12

Michael Hecht

2,2837 gold badges31 silver badges39 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Michael Hecht Over a year ago

Meanwhile I replaced rand() by the solution proposed here: lemire.me/blog/2019/03/19/… and got an execution time of 0.59 s!

Collectives™ on Stack Overflow

Performance of LLVM-Compiler on native C code vs Python+Numba

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related