3

I have been trying to optimize a python script I wrote for the last two days. Using several profiling tools (cProfile, line_profiler etc.) I narrowed down the issue to the following function below.

df is a numpy array with 3 columns and +1,000,000 rows (data type is float). Using line_profiler, I found out that the function spends most of the time whenever it needs to access the numpy array.

full_length += head + df[rnd_truck, 2]

and

full_weight += df[rnd_truck,1]

take most of the time, followed by

full_length = df[rnd_truck,2]

full_weight = df[rnd_truck,1]

lines.

As far as I see the bottleneck is caused by the access time the function tries to grab a number from the numpy array.

When I run the function as MonteCarlo(df, 15., 1000.) it takes 37 seconds to call the function for 1,000,000 times in a i7 3.40GhZ 64bit Windows machine with 8GB RAM. In my application, I need to run it for 1,000,000,000 to ensure convergence, which brings the execution time to more than an hour. I tried using the operator.add method for the summation lines, but it did not help me at all. It looks like I have to figure out a faster way to access this numpy array.

Any ideas would be welcome!

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

Below is a portion of the df numpy array I am using:

In [31] df
Out[31]: 
array([[  12. ,  220.4,  108.4],
       [  11. ,  220.4,  106.2],
       [  11. ,  220.3,  113.6],
       ..., 
       [   4. ,   13.9,   36.8],
       [   3. ,   13.7,   33.9],
       [   3. ,   13.7,   10.7]])
10
  • What do the inputs represent, and what are their dimensions? Commented Aug 8, 2013 at 19:47
  • Since it's only 3 columns, I wonder if you'd be better off with 3 separate arrays. Commented Aug 8, 2013 at 19:48
  • 3
    If I'm reading this right, you don't have a single vectorized operation in this function. You could save a lot of time by randomly picking a large number of trucks at once and having NumPy vectorize the addition of lengths and weights for you. Commented Aug 8, 2013 at 19:55
  • Can you show a small check of df, or a way to generate this data. Commented Aug 8, 2013 at 20:00
  • @user2357112 you are right, I do not use any vectorized operation in this function. I use it somewhere else, out of this function, and it does its job well in terms of speed. Do you think using a list for such an operation would be faster? I can convert the array columns to lists then feed them to the function. Commented Aug 8, 2013 at 20:05

4 Answers 4

3

As noted by other people, this isn't vectorized at all, so your slowness is really due to slowness of the Python interpreter. Cython can help you a lot here with minimal changes:

>>> %timeit MonteCarlo(df, 5, 1000)
10000 loops, best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df, 5, 1000)
100000 loops, best of 3: 3.67 us per loop

where MonteCarlo_cy is just (in the IPython notebook, after %load_ext cythonmagic):

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:, ::1] df, double head, double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0, n)
    cdef double full_weight = df[rnd_truck, 1]
    cdef double full_length = df[rnd_truck, 2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0, n)
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck, 1]

    # Return average weight per feet on the bridge
    return full_weight / span
Sign up to request clarification or add additional context in comments.

1 Comment

To speed this up further (and allow simple parallelization with OpenMP), you might want to switch out the calls to np.random.randint with some C-level random number generator. Here are some suggestions.
2

Using cython to compile the function gives a very substantial improvement to runtime.

In a separate file called "funcs.pyx" I have the following code:

cimport cython
import numpy as np
cimport numpy as np


def MonteCarlo(np.ndarray[np.float_t, ndim=2] df, float head, float span):
    # Pick initial truck
    cdef int rnd_truck = np.random.randint(0,len(df))
    cdef float full_length = df[rnd_truck,2]
    cdef float full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

Everything is the same except for the type declarations in front of the variables.

Here's the file I used to test it out:

import numpy as np
import pyximport
pyximport.install(reload_support=True, setup_args={'include_dirs':[np.get_include()]})
import funcs

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]
    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck, 2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]
    # Return average weight per feet on the bridge
    return(full_weight/span)

df = np.random.rand(1000000,3)
reload(funcs)
%timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
%timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]

I only ran it 10000 times, but even so, there's a huge improvement.

16:42:30: In [31]: %timeit [funcs.MonteCarlo(df, 15, 1000) for i in range(10000)]
10 loops, best of 3: 131 ms per loop

16:42:37: In [32]: %timeit [MonteCarlo(df, 15, 1000) for i in range(10000)]
1 loops, best of 3: 1.75 s per loop

Comments

2

Something to point out is Monte Carlo is embarrassingly parallel. No matter which solution you pick you should be doing something to parallelize it. Using @Dougal's answer.

from multiprocessing import Pool

def RunVMC(n):
    return MonteCarlo_cy(df,head,span)


pool=Pool(processes=4)

%timeit [MonteCarlo_cy(df,15,1000) for x in range(1000000)]
1 loops, best of 3: 3.89 s per loop

#Pool @ 4
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 0.973 s per loop

#Pool @ 8
%timeit out=pool.map(RunVMC,xrange(1000000))
1 loops, best of 3: 568 ms per loop

5 Comments

Yes, this should definitely be run in parallel. Note that this approach makes one copy per worker of the (fairly large) df array, though. Using Cython, killing the small bit of GIL-needing code for the RNG as I mentioned in a comment on my answer, and then using prange is more memory-efficient. (Also, it's worth noting that your RunVMC function seems to have no purpose over just calling MonteCarlo_cy....)
I updated the function to what it should have been, using df, head, and span as global variables. The best way I have seen to pass additional arguments with map is using itertools which has always seemed more of hassle then its worth.
@Ophion this is the only part I could not work out. In my home machine (running OS X) the multiprocessing code works without any problem, and gives me a 100% increase in speed. In my work mahcine (running Win 7) I get an Import Error: No module named montecarlo_v5b error in the kernel window (not in the ipython GUI), which goes on infinitely, as soon as I enter the pool=Pool(processes=4) statement. Is this a Windows quirk? montecarlo_v5b is the name of my py file.
@marillion Im sorry I have no experience with windows, I know there are some quirks compared to linux- a google search should probably return the answer.
@Ophion as far as I understand, I have to use a if __name__ == '__main__': statement somewhere in the code, but not exactly sure where. Well, I have already spent too much time on optimizing this function, and there's still lots to do, so I should better learn how to use the multiprocessing module in Windows in another project :) But as I said, it works in OS X (or any OS that supports forking probably) like a charm!
0

You can try switching to another Python variant. Jython is a bit faster than Python, and in some cases PyPy is a lot faster. Give them both a try.

4 Comments

I don't think either of those support NumPy. (I'm not sure whether the OP is actually taking advantage of NumPy, though.)
PyPy supports a subset of Numpy, appropriately called NumPyPy.
Oh, they started work on that? I thought they were still just gathering funding to hire someone. It doesn't look ready enough that I'd want to use it, but having NumPy support in PyPy will be great once it's done.
I was planning to try PyPy, however, I do not have admin privileges to install it on the computer I am using. I am stuck with the stock python interpreter :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.