1

I'm saving images in a multi dimenisonal numpy array a in shape (100,128,128,1). I'd like to check whether there are duplicate images in array a. Other than some for loops implementation, what would be the pythonic way to do it?

7
  • You gotta explain more about the input data. Commented Jun 14, 2020 at 15:52
  • Maybe from itertools import permutations is a solution? Commented Jun 14, 2020 at 15:52
  • @Divakar Images are in (128,128,1) size and there are 100 images in the array a. I'd like to see whether there are duplicate images in entire a. Commented Jun 14, 2020 at 15:56
  • And if there are duplicate images in array a, what's the expected output? Commented Jun 14, 2020 at 15:57
  • 1
    Think you should look into np.unique( ..axis, return_index=True). Check out return_inverse argument too. Commented Jun 14, 2020 at 16:00

1 Answer 1

2

As stated in the comments, we need np.unique, but some extra steps are needed to gather duplicate indices. Here's the complete implementation that gather duplicate images in one tuple each, while non-duplicate ones would end up alone in a tuple each -

def gather_duplicate_indices(a):
    _,tags,count = np.unique(a, axis=0, return_inverse=True, return_counts=True)
    sidx = tags.argsort()
    return np.split(sidx, count.cumsum())[:-1]

Sample run -

In [43]: np.random.seed(0)
    ...: a = np.random.randint(0,5,(10,2,2,1))
    ...: a[5] = a[2]
    ...: a[7] = a[2]
    ...: a[6] = a[1]
    ...: a[9] = a[4]
# so the pairings are (2, 5, 7), (1, 6), (4, 9), while rest are singles.

In [44]: gather_duplicate_indices(a)
Out[44]: 
[array([4, 9]),
 array([8]),
 array([3]),
 array([1, 6]),
 array([2, 5, 7]),
 array([0])]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.