I'm saving images in a multi dimenisonal numpy array a in shape (100,128,128,1). I'd like to check whether there are duplicate images in array a. Other than some for loops implementation, what would be the pythonic way to do it?
1 Answer
As stated in the comments, we need np.unique, but some extra steps are needed to gather duplicate indices. Here's the complete implementation that gather duplicate images in one tuple each, while non-duplicate ones would end up alone in a tuple each -
def gather_duplicate_indices(a):
_,tags,count = np.unique(a, axis=0, return_inverse=True, return_counts=True)
sidx = tags.argsort()
return np.split(sidx, count.cumsum())[:-1]
Sample run -
In [43]: np.random.seed(0)
...: a = np.random.randint(0,5,(10,2,2,1))
...: a[5] = a[2]
...: a[7] = a[2]
...: a[6] = a[1]
...: a[9] = a[4]
# so the pairings are (2, 5, 7), (1, 6), (4, 9), while rest are singles.
In [44]: gather_duplicate_indices(a)
Out[44]:
[array([4, 9]),
array([8]),
array([3]),
array([1, 6]),
array([2, 5, 7]),
array([0])]
from itertools import permutationsis a solution?(128,128,1)size and there are100images in the arraya. I'd like to see whether there are duplicate images in entirea.np.unique( ..axis, return_index=True). Check outreturn_inverseargument too.