Custom filtering function for python objects

Question

I have a list of tuples like (id, ) and I want to remove duplicates ids. In the case where there are multiple pairs with the same id, I'd like to keep the one that has an object with a higher score. How could I implement this efficiently?


# For the sake of example - assume that a hashing function is implemented based on the score

class Object
   def __init__(self):
       score = 0
   def __repr__(self):
       return f'<Object {self.score}>'

pairs = [(1, <Object 1>), (1, <Object 1>), (3, <Object 7>), (9, <Object 3>), (9, <Object 4>)]

filtered_pairs = [(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]

I know that I can call set on the pairs, but that'll only take care of the cases where both the id and score are equivalent (like with Object 1). How can I filter it but in the case where there are matching ids, take the higher score?

I know that I could do groupby from itertools, and implement a sort using the score as the key and then just take the last item from every group, but I'm wondering if there's a more efficient way.

Is your pairs list always ordered by id, or at least are the duplicate ids always adjacent? — kaya3
– kaya3, Commented Jan 29, 2020 at 6:45

Guy · Accepted Answer · 2020-01-29 07:00:03Z

2

You can use itertools.groupby to group by the first values and use max on the result

from itertools import groupby


class Object:

    def __init__(self, score):
        self.score = score

    def __repr__(self):
        return f'<Object {self.score}>'


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(3)), (9, Object(4))]

filtered_pairs = [max(list(elem), key=lambda x: x[1].score) for grp, elem in groupby(pairs, lambda x: (x[0]))]
print(filtered_pairs)

Output:

[(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]

answered Jan 29, 2020 at 7:00

Guy

51.2k10 gold badges49 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Mark · Accepted Answer · 2020-01-29 06:54:51Z

Since you are considering a set, I'm assuming the original order isn't important. If that's the case, one options is to add a __lt__ method to your class so you can compare objects by score. Then sort the tuples in reverse order, group by the integer, and take the first item from each group. It's easier to see in code than explain:

from itertools import groupby

class myObject:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __lt__(self, other):
        return self.score < other.score

pairs = [(1, myObject(1)), (1, myObject(1)), (3, myObject(7)), (9, myObject(3)), (9, myObject(4))]

[next(v) for k, v in groupby(sorted(pairs, reverse=True), key=lambda x: x[0])]

Result

[(9, <Object 4>), (3, <Object 7>), (1, <Object 1>)]

Grismar · Accepted Answer · 2020-01-29 06:58:45Z

0

Something like this:

from collections import namedtuple

Pair = namedtuple('Pair', ['id', 'score'])

pairs = [Pair(*t) for t in [(1, 1), (1, 1), (3, 7), (9, 3), (9, 4)]]

best_pairs = {}
for p in pairs:
    if p.id not in best_pairs or p.score > best_pairs[p.id]:
        best_pairs[p.id] = p.score

pairs = [Pair(*t) for t in best_pairs.items()]

print(pairs)

namedtuple is only in there as a more concise version of your Object and the conversion back to pairs as a list of Pairs is only in there in case you don't like your result being the dictionary best_pairs.

Result:

[Pair(id=1, score=1), Pair(id=3, score=7), Pair(id=9, score=4)]

answered Jan 29, 2020 at 6:58

Grismar

32.4k6 gold badges42 silver badges68 bronze badges

Comments

kaya3 · Accepted Answer · 2020-01-29 07:00:49Z

0

You can sort by score, convert to dict (so that max scores are dict values) and convert back to list of tuples:

class Object:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __gt__(self, other):
        return self.score > other.score


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(4)), (9, Object(3))]
filtered_pairs = list(dict(sorted(pairs)).items())

edited Jan 29, 2020 at 7:00

kaya3

51.6k7 gold badges87 silver badges118 bronze badges

answered Jan 29, 2020 at 7:00

Paulius

4845 silver badges12 bronze badges

1 Comment

kaya3 Over a year ago

To properly format code, you can use triple backticks - ``` - I've fixed it for you here.

Collectives™ on Stack Overflow

Custom filtering function for python objects

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related