2

What is the best way to remove the minimal number of elements from a sorted Numpy array so that the minimal distance among the remaining is always bigger than a certain threshold?

For example, if the threshold is 1, the following sequence [0.1, 0.5, 1.1, 2.5, 3.] will become [0.1, 1.1, 2.5]. The 0.5 is removed because it is too close to 0.1 but then 1.1 is preserved because it is far enough from 0.1.

My current code:

import numpy as np

MIN_DISTANCE = 1    
a = np.array([0.1, 0.5, 1.1, 2.5, 3.])

for i in range(len(a)-1):
    if(a[i+1] - a[i] < MIN_DISTANCE):
        a[i+1] = a[i]

a = np.unique(a)
a
array([0.1, 1.1, 2.5])

Is there a more efficient way to do so?

Note that my question is similar to Remove values from numpy array closer to each other but not exactly the same.

7
  • 1
    Did you try numba - numba.pydata.org? Commented Aug 2, 2019 at 14:05
  • What is the application of this? Additionally this is not distance, you are talking about difference, and it also appears you need elements sorted. Commented Aug 2, 2019 at 19:50
  • thanks @Divakar, Numba does improve execution time Commented Aug 4, 2019 at 9:35
  • My question still remains, is there a more pythonic way to write this? Commented Aug 4, 2019 at 9:39
  • I meant 2D/3D convolution supported by SciPy. Commented Aug 4, 2019 at 9:41

1 Answer 1

1

You could use numpy.ufunc.accumulate to iterate thru adjacent pairs of the array instead of the for loop.


Code: (with an extended array to cross check some additional cases, but works with your array as well)

import numpy as np


MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 0.6, 0.7, 1.1, 2.5, 3., 4., 6., 6.1])
print("original: \n" + str(a))


def my_py_function(arr1, arr2):
    if(arr2 - arr1 < MIN_DISTANCE):
        arr2 = arr1
    return arr2


my_np_function = np.frompyfunc(my_py_function, 2, 1)

my_np_function.accumulate(a, dtype=np.object, out=a).astype(float)


print("complete: \n" + str(a))
a = np.unique(a)
print("unique: \n" + str(a))

Result:

original:
[0.1 0.5 0.6 0.7 1.1 2.5 3.  4.  6.  6.1]
complete:
[0.1 0.1 0.1 0.1 1.1 2.5 2.5 4.  6.  6. ]
unique:
[0.1 1.1 2.5 4.  6. ]

Concerning execution time timeit shows a turnaround at array length of about 20.

  • Your code is much faster (relative) for your array length of 5
  • whereas for array length >>20 the accumulate option speeds up considerably (~35% in time for array length 300)
Sign up to request clarification or add additional context in comments.

2 Comments

Very interesting solution, but is it possible to optimize this with Numba? Otherwise for very large vectors or a large amount of function call, my code optimized with Numba is still more performant.
@Giampietro Seu I haven't worked with Numba yet, therefore I can't tell how it would handle that code.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.