numpy.unique acts weird with numpy.array of objects

Question

This question is related to (but not the same as) "numpy.unique generates a list unique in what regard?"

The setup:

import numpy as np
from functools import total_ordering

@total_ordering
class UniqueObject(object):
    def __init__(self, a):
        self.a = a
    def __eq__(self, other):
        return self.a == other.a
    def __lt__(self, other):
        return self.a < other.a
    def __hash__(self):
        return hash(self.a)
    def __str__(self):
        return "UniqueObject({})".format(self.a)
    def __repr__(self):
        return self.__str__()

Expected behaviour of np.unique:

>>> np.unique([1, 1, 2, 2])
array([1, 2])
>>> np.unique(np.array([1, 1, 2, 2]))
array([1, 2])
>>> np.unique(map(UniqueObject, [1, 1, 2, 2]))
array([UniqueObject(1), UniqueObject(2)], dtype=object)

Which is no problem, it works. But this doesn't work as expected:

>>> np.unique(np.array(map(UniqueObject, [1, 1, 2, 2])))
array([UniqueObject(1), UniqueObject(1), UniqueObject(2), UniqueObject(2)], dtype=object)

How come np.array with dtype=object is handled differently than a python list with objects?

That is:

objs = map(UniqueObject, [1, 1, 2, 2])
np.unique(objs) != np.unique(np.array(objs)) #?

I'm running numpy 1.8.0.dev-74b08b3 and Python 2.7.3

DSM · Accepted Answer · 2013-05-06 14:58:38Z

Following through the source of np.unique, it seems that the branch which is actually taken is

else:
    ar.sort()
    flag = np.concatenate(([True], ar[1:] != ar[:-1]))
    return ar[flag]

which simply sorts the terms and then takes the ones which aren't equal to the previous one. But shouldn't that work?.. oops. This is on me. Your original code defined __ne__, and I accidentally removed it when removing the comparisons being total_ordering-ed.

>>> UniqueObject(1) == UniqueObject(1)
True
>>> UniqueObject(1) != UniqueObject(1)
True

Putting __ne__ back in:

>>> UniqueObject(1) != UniqueObject(1)
False
>>> np.array(map(UniqueObject, [1,1,2,2]))
array([UniqueObject(1), UniqueObject(1), UniqueObject(2), UniqueObject(2)], dtype=object)
>>> np.unique(np.array(map(UniqueObject, [1,1,2,2])))
array([UniqueObject(1), UniqueObject(2)], dtype=object)

Collectives™ on Stack Overflow

numpy.unique acts weird with numpy.array of objects

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related