3

I'm doing a kernel density estimation of a dataset (a collection of points).

The estimation process is ok, the problem is that, when I'm trying to get the density value for each point, the speed is very slow:

from sklearn.neighbors import KernelDensity
# this speed is ok
kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample) 
# this is very slow
kde_result = kde.score_samples(sample) 

The sample is consist of 300,000 (x,y) points.

I'm wondering if it's possible to make it run parallely, so the speed would be quicker?

For example, maybe I can divide the sample in to smaller sets and run the score_samples for each set at the same time? Specifically:

  1. I'm not familliar with parallel computing at all. So I'm wondering if it's applicable in my case?
  2. If this can really speed up the process, what should I do? I'm just running the script in ipython notebook, and have no prior expereince in this, is there any good and simple example for my case?

I'm reading http://ipython.org/ipython-doc/dev/parallel/parallel_intro.html now.

UPDATE:

import cProfile
cProfile.run('kde.score_samples(sample)')

        64 function calls in 8.653 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    8.653    8.653 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 _methods.py:31(_sum)
        2    0.000    0.000    0.000    0.000 base.py:870(isspmatrix)
        1    0.000    0.000    8.653    8.653 kde.py:133(score_samples)
        4    0.000    0.000    0.000    0.000 numeric.py:464(asanyarray)
        2    0.000    0.000    0.000    0.000 shape_base.py:60(atleast_2d)
        2    0.000    0.000    0.000    0.000 validation.py:105(_num_samples)
        2    0.000    0.000    0.000    0.000 validation.py:126(_shape_repr)
        6    0.000    0.000    0.000    0.000 validation.py:153(<genexpr>)
        2    0.000    0.000    0.000    0.000 validation.py:268(check_array)
        2    0.000    0.000    0.000    0.000 validation.py:43(_assert_all_finite)
        6    0.000    0.000    0.000    0.000 {hasattr}
        4    0.000    0.000    0.000    0.000 {isinstance}
       12    0.000    0.000    0.000    0.000 {len}
        2    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
        1    8.652    8.652    8.652    8.652 {method 'kernel_density' of 'sklearn.neighbors.kd_tree.BinaryTree' objects}
        2    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        2    0.000    0.000    0.000    0.000 {method 'sum' of 'numpy.ndarray' objects}
        6    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
6
  • Have you tried using a different kernel? With so many points, the choice of kernel should have only a marginal effect, but 'linear' and 'tophat' maybe faster to calculate. Commented Sep 17, 2015 at 8:08
  • @Rob, just tried linear, still very slow on kde_result = kde.score_samples(sample) Commented Sep 17, 2015 at 8:12
  • 1
    You stated that you want parallelism to make it quicker. This is not necessarily true. First, profile your code to check the hotspots. Once you're sure what should be optimized, see what you can do to optimize it. We both have a gut feeling on what is taking long, but profiling will give you a much better insight (something explicit you can look up and maybe someone else already tried an optimization) Commented Sep 17, 2015 at 11:37
  • @FelipeLema this is really what I should looking for, thanks a lot! Commented Sep 18, 2015 at 0:36
  • 1
    Unfortunatelly the hard work is being done cython at kernel_density. I'm not aware of a parallel implementation of this, so it looks you're going to have to start from scratch. I found this post that might help you get started and edited the question so a dev from scikit can give you a better insight. Commented Sep 21, 2015 at 12:18

1 Answer 1

2

Here is a simple example of parallelization using multiprocessing built-in module :

import numpy as np
import multiprocessing
from sklearn.neighbors import KernelDensity

def parrallel_score_samples(kde, samples, thread_count=int(0.875 * multiprocessing.cpu_count())):
    with multiprocessing.Pool(thread_count) as p:
        return np.concatenate(p.map(kde.score_samples, np.array_split(samples, thread_count)))

kde = KernelDensity(bandwidth=2.0,atol=0.0005,rtol=0.01).fit(sample) 
kde_result = parrallel_score_samples(kde, sample)

As you can see from code above, multiprocessing.Pool allows you to map a pool of worker processes executing kde.score_samples on a subset of your samples.
The speedup will be significant if your processor have enough cores.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.