Pythonic way to remove elements from Numpy array closer than threshold

Question

What is the best way to remove the minimal number of elements from a sorted Numpy array so that the minimal distance among the remaining is always bigger than a certain threshold?

For example, if the threshold is 1, the following sequence [0.1, 0.5, 1.1, 2.5, 3.] will become [0.1, 1.1, 2.5]. The 0.5 is removed because it is too close to 0.1 but then 1.1 is preserved because it is far enough from 0.1.

My current code:

import numpy as np

MIN_DISTANCE = 1    
a = np.array([0.1, 0.5, 1.1, 2.5, 3.])

for i in range(len(a)-1):
    if(a[i+1] - a[i] < MIN_DISTANCE):
        a[i+1] = a[i]

a = np.unique(a)

a
array([0.1, 1.1, 2.5])

Is there a more efficient way to do so?

Note that my question is similar to Remove values from numpy array closer to each other but not exactly the same.

What is the application of this? Additionally this is not distance, you are talking about difference, and it also appears you need elements sorted. — Krupip
– Krupip, Commented Aug 2, 2019 at 19:50
My question still remains, is there a more pythonic way to write this? — Giampietro Seu
– Giampietro Seu, Commented Aug 4, 2019 at 9:39

MagnusO_O · Accepted Answer · 2019-08-11 17:36:55Z

1

You could use numpy.ufunc.accumulate to iterate thru adjacent pairs of the array instead of the for loop.

The numpy.add.accumulate example or itertools.accumulate probably shows best what it's doing.
Along with numpy.frompyfunc your condition can be applied as ufunc (universal functions ).

Code: (with an extended array to cross check some additional cases, but works with your array as well)

import numpy as np


MIN_DISTANCE = 1
a = np.array([0.1, 0.5, 0.6, 0.7, 1.1, 2.5, 3., 4., 6., 6.1])
print("original: \n" + str(a))


def my_py_function(arr1, arr2):
    if(arr2 - arr1 < MIN_DISTANCE):
        arr2 = arr1
    return arr2


my_np_function = np.frompyfunc(my_py_function, 2, 1)

my_np_function.accumulate(a, dtype=np.object, out=a).astype(float)


print("complete: \n" + str(a))
a = np.unique(a)
print("unique: \n" + str(a))

Result:

original:
[0.1 0.5 0.6 0.7 1.1 2.5 3.  4.  6.  6.1]
complete:
[0.1 0.1 0.1 0.1 1.1 2.5 2.5 4.  6.  6. ]
unique:
[0.1 1.1 2.5 4.  6. ]

Concerning execution time timeit shows a turnaround at array length of about 20.

Your code is much faster (relative) for your array length of 5
whereas for array length >>20 the accumulate option speeds up considerably (~35% in time for array length 300)

edited Aug 11, 2019 at 17:36

answered Aug 11, 2019 at 13:33

MagnusO_O

1,2834 gold badges16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Giampietro Seu Over a year ago

Very interesting solution, but is it possible to optimize this with Numba? Otherwise for very large vectors or a large amount of function call, my code optimized with Numba is still more performant.

MagnusO_O Over a year ago

@Giampietro Seu I haven't worked with Numba yet, therefore I can't tell how it would handle that code.

Collectives™ on Stack Overflow

Pythonic way to remove elements from Numpy array closer than threshold

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related