2

Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...

To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.

import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool

def threaded(sources, destinations):
    # Define worker function
    def worker(point):
        dlt = (destinations-point) # delta between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        return np.argmin(d) # return closest index

    # Multithread!
    p = ThreadPool()
    return p.map(worker, sources)


def unthreaded(sources, destinations):
    results = []
    #for p in sources:
    for i in range(len(sources)):
        dlt = (destinations-sources[i]) # difference between destinations and given point
        d = inner1d(dlt,dlt) # get distances
        results.append(np.argmin(d)) # append closest index

    return results


# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000      # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100

#Compare!
print 'threaded:   %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]

Retults:

threaded:   0.894030461056
unthreaded: 1.97295164054

Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.

All recommendations to improve performance (within the limitations described above) will be greatly appreciated!

7
  • 2
    Before trying to add horsepower by means of multithreading, I would start with better algorithms... Currently your brute force approach is O(N*M), while, after preparing a KD tree of the target points, you could make it O(N*log(M)). Commented Aug 5, 2015 at 23:37
  • 1
    Why are you committed to using only numpy? Commented Aug 6, 2015 at 0:32
  • @ali_m i have to use a custom build of python called mayapy, Autodesk's implementation of python inside their 3D software (Maya). It's compiled against VC2010 64bit, so pre-built binaries such as Christoph Gohlke's and others aren't compatible. I was able to build numpy from scratch but failed to build scipy, numexpr, etc. Commented Aug 6, 2015 at 4:45
  • @MatteoItalia I did look for pure python KD tree implementations, notably i found this one and this one. Both were vastly slower (like 10x) than my unthreaded numpy test. So i'm afraid that without access to scipy (as stated above) KD trees are a dead end. Commented Aug 6, 2015 at 5:54
  • Just for reference, on my puny dual core laptop cKDTree takes about 17ms to both construct the tree and query it using your example data, compared with 1.53s for threaded and 1.39s for unthreaded. I am absolutely convinced that k-D trees are worth pursuing. I'm very surprised that you saw such slow query times for pykdtree, which seems to be implemented in C/Cython. Commented Aug 6, 2015 at 20:14

1 Answer 1

2

Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:

  • They're probably using CPython inside (several references to that documentation and not any other).
  • They're not fond of threads (lots of non-thread safe methods)

Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.

  • Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
  • Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
  • The above should've worked out for you. If not, try another parallel tool (after some serious praying).

On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.

Sign up to request clarification or add additional context in comments.

1 Comment

Yeah it's very annoying. Because it seems there's no way to imrpove performance by staying purely in numpy, I may end up pickeling the data out and have an external "standard" python instance running (with scipy and all its arsenal) to crunch it down and spit out the result. Not what i want but at this time i need to move on.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.