0

I have a list of numpy arrays and want to remove duplicates and also keep the order of my sorted data. This is my array with duplicates:

dup_arr=[np.array([[0., 10., 10.],\
                   [0., 2., 30.],\
                   [0., 3., 5.],\
                   [0., 3., 5.],\
                   [0., 3., 40.]]),\
         np.array([[0., -1., -4.],\
                   [0., -2., -3.],\
                   [0., -3., -5.],\
                   [0., -3., -6.],\
                   [0., -3., -6.]])]

I tried to do it using the following code:

clean_arr=[]
for i in dup_arr:
    new_array = [tuple(row) for row in i]
    uniques = np.unique(new_array, axis=0)
    clean_arr.append(uniques)

But the problem of this method is that it changes the sort of my data and I do not want to to sort them again because it is a tough task for my real data. I want to have the following result:

clean_arr=[np.array([[0., 10., 10.],\
                     [0., 2., 30.],\
                     [0., 3., 5.],\
                     [0., 3., 40.]]),\
           np.array([[0., -1., -4.],\
                     [0., -2., -3.],\
                     [0., -3., -5.],\
                     [0., -3., -6.]])]

But the code shuffle it. I also tried the foolowing for loops but it was not also successful because I can not iterate until the end of my data and stop the second for loop before reaching to the end of each array of my list.

clean_arr=[]
for arrays in dup_arr:
    for rows in range (len(arrays)-1):
        if np.all(arrays [rows]== arrays [rows+1]):
            continue
        else:
            dat= arrays [rows]
            clean_arr.append(dat)

In advance, I do appreciate any help and contribution.

2 Answers 2

3

You can simply use np.unique with axis=0. If you want to keep the order from the original sequence try this -

[i[np.sort(np.unique(i, axis=0, return_index=True)[1])] for i in dup_arr]
[array([[ 0., 10., 10.],
        [ 0.,  2., 30.],
        [ 0.,  3.,  5.],
        [ 0.,  3., 40.]]),
 array([[ 0., -1., -4.],
        [ 0., -2., -3.],
        [ 0., -3., -5.],
        [ 0., -3., -6.]])]
  1. np.unique(i, axis=0, return_index=True)[1] returns the indexes of the unique elements.
  2. np.sort() sorts these indexes back to original sequence in array.
  3. [f(i) for i in dup_arr] applies the above 2 steps over each element in dup_arr.

NOTE: You will NOT be able to completely vectorize this operation (say by np.stack on this operations since it will may have variable duplicates removed from each matrix. This will cause the numpy array to have unequal shapes over an axis.


Breaking the steps as a function -

def f(a):
    indexes = np.unique(a, axis=0, return_index=True)[1]
    return a[np.sort(indexes)]

[f(i) for i in dup_arr]
Sign up to request clarification or add additional context in comments.

5 Comments

Dear @Akshay Sehgal, Thanks for your time. I do appreciate your solution. But, the problem is that your solution ic changing the sorting of my initial array and I do not want to change it. I just want to remove duplicate rows and other rows should stay in their relative position. I mean they should not be shifted up or down.
Updated my answer do check and let me know if it solves it.
Changed this to a one liner after removing the 2nd list comprehension. You only need 1 here.
Dear @ Akshay Sehgal, Thanks again for being that much helpful. Maybe your post will be more informative if you keep also the functoin method. I hink you removed it in your last edit.
Sure adding that for more details
1

In my humble opinion, every time you want to remove duplicates from an array or list in Python, you should consider using a set.

Also, try to avoid using multiple nested loops since errors easily occur and they're hard to find. I suggest you give the following code a try:

removed_duplicates=[]
    
for subarr in dup_arr:
    removed_duplicates.append(np.array([list(item) for item in set(tuple(row) for row in subarr)]))

Basically what's happening is that you convert your array to a tuple, then to a set that removes all duplicates, and then to a list. Since your original data had an array of np.arrays, your convert the list back to a np.array before you append it to the new array.

Would this work?

1 Comment

Dear @Erik Hallin, Thanks for your time. I do appreciate your solution. The problem is that my sorted array is changed by your solution and I want to keep the sorting of my initial array. I just want to delete the duplicates and do not touch the sorting.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.