0

I recently did some tests on performance optimization in Python. One part was doing a benchmark on Monte-Carlo Pi calculation using SWIG and compile a library to import in Python. The other solution was using Numba. Now I totally wonder why the native C solution is worse than Numba even if LLVM compiler is used for both. So I'm wondering if I'm doing something wrong.

Runtime on my Laptop

native C module: 7.09 s
Python+Numba:    2.75 s

Native C code

#include "swigtest.h"
#include <time.h>
#include <stdlib.h>
#include <stdio.h>

float monte_carlo_pi(long nsamples)
{
    int accGlob=0;
    int accLoc=0;
    int i,ns;
    float x,y;
    float res;
    float iRMX=1.0/(float) RAND_MAX;
    
    srand(time(NULL));
    
    for(i=0;i<nsamples;i++)
    {
      x = (float)rand()*iRMX;
      y = (float)rand()*iRMX;

      if((x*x + y*y) < 1.0) { acc += 1;}      
    }    
      
    res = 4.0 * (float) acc / (float) nsamples;
      
    printf("cres = %.5f\n",res);
    
    return res;
}

swigtest.i

%module swigtest

%{
#define SWIG_FILE_WITH_INIT
#include "swigtest.h"
%}

float monte_carlo_pi(long nsamples);

Compiler call

clang.exe swigtest.c swigtest_wrap.c -Ofast -o _swigtest.pyd -I C:\python37\include -shared -L c:\python37\libs -g0 -mtune=intel -msse4.2 -mmmx

testswig.py

from swigtest import monte_carlo_pi
import time
import os

start = time.time()
   
pi = monte_carlo_pi(250000000)

print("pi: %.5f" % pi)
print("tm:",time.time()-start)

Python version with Numba

from numba import jit
import random
import time

start = time.time()

@jit(nopython=True,cache=True,fastmath=True)
def monte_carlo_pi(nsamples: int)-> float:
    acc:int = 0
    for i in range(nsamples):
        x:float = random.random()
        y:float = random.random()
        if (x * x + y * y) < 1.0: acc += 1
        
    return 4.0 * acc / nsamples
    
pi = monte_carlo_pi(250000000)

print("pi:",pi)
print("tm:",time.time()-start)
10
  • The question is, where the time is spent. I would not be surprised, this happens in random number generation. Using different random number generators will lead to different timings Commented Feb 16, 2021 at 8:41
  • I also thougth that rand might be the problem. On the other hand it is a function availabe for decades, so should be optimized as much as possible. Commented Feb 16, 2021 at 8:45
  • Did you run a profiler? Is rand optimized for speed or „randomness“? Commented Feb 16, 2021 at 8:52
  • No. I will check this, but nevertheless it can be expected that rand consumes the time. Is Numba working with another random generator? Because the other C code is obviously more or less optimal ... Commented Feb 16, 2021 at 9:27
  • Apart from the random number generator it would also make sense to compare with the same compiler settings (march=native, O3, Ofast) and using the same datatypes (double and int64). Have a look at monte_carlo_pi.inspect_types() there you can see which datatypes are used in the Numba implementation. Commented Feb 16, 2021 at 9:33

1 Answer 1

2

Summary up to now:

The rand() function seems to consume most of the time. Using a deterministic approach like this

...
ns     = (long) sqrt((double)nsamples)+1;
dx     = 1./sqrt((double)nsamples);
dy     = dx;
...
for(i=0;i<ns;i++)
          for(k=0;k<ns;k++)
          {
            x = i*dx;
            y = k*dy;

            if((x*x + y*y) < 1.0) { accLoc += 1;}      
          }  
...

instead of rand() results in an execution tim of only 0.04 s! Obviously Numba uses another much more efficient random function.

Sign up to request clarification or add additional context in comments.

1 Comment

Meanwhile I replaced rand() by the solution proposed here: lemire.me/blog/2019/03/19/… and got an execution time of 0.59 s!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.