1

I have a list of tuples like (id, ) and I want to remove duplicates ids. In the case where there are multiple pairs with the same id, I'd like to keep the one that has an object with a higher score. How could I implement this efficiently?


# For the sake of example - assume that a hashing function is implemented based on the score

class Object
   def __init__(self):
       score = 0
   def __repr__(self):
       return f'<Object {self.score}>'

pairs = [(1, <Object 1>), (1, <Object 1>), (3, <Object 7>), (9, <Object 3>), (9, <Object 4>)]

filtered_pairs = [(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]

I know that I can call set on the pairs, but that'll only take care of the cases where both the id and score are equivalent (like with Object 1). How can I filter it but in the case where there are matching ids, take the higher score?

I know that I could do groupby from itertools, and implement a sort using the score as the key and then just take the last item from every group, but I'm wondering if there's a more efficient way.

2
  • Is your pairs list always ordered by id, or at least are the duplicate ids always adjacent? Commented Jan 29, 2020 at 6:45
  • 1
    Do yourself a favour and don't name your class Object. Commented Jan 29, 2020 at 6:47

4 Answers 4

2

You can use itertools.groupby to group by the first values and use max on the result

from itertools import groupby


class Object:

    def __init__(self, score):
        self.score = score

    def __repr__(self):
        return f'<Object {self.score}>'


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(3)), (9, Object(4))]

filtered_pairs = [max(list(elem), key=lambda x: x[1].score) for grp, elem in groupby(pairs, lambda x: (x[0]))]
print(filtered_pairs)

Output:

[(1, <Object 1>), (3, <Object 7>), (9, <Object 4>)]
Sign up to request clarification or add additional context in comments.

Comments

1

Since you are considering a set, I'm assuming the original order isn't important. If that's the case, one options is to add a __lt__ method to your class so you can compare objects by score. Then sort the tuples in reverse order, group by the integer, and take the first item from each group. It's easier to see in code than explain:

from itertools import groupby

class myObject:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __lt__(self, other):
        return self.score < other.score

pairs = [(1, myObject(1)), (1, myObject(1)), (3, myObject(7)), (9, myObject(3)), (9, myObject(4))]

[next(v) for k, v in groupby(sorted(pairs, reverse=True), key=lambda x: x[0])]

Result

[(9, <Object 4>), (3, <Object 7>), (1, <Object 1>)]

Comments

0

Something like this:

from collections import namedtuple

Pair = namedtuple('Pair', ['id', 'score'])

pairs = [Pair(*t) for t in [(1, 1), (1, 1), (3, 7), (9, 3), (9, 4)]]

best_pairs = {}
for p in pairs:
    if p.id not in best_pairs or p.score > best_pairs[p.id]:
        best_pairs[p.id] = p.score

pairs = [Pair(*t) for t in best_pairs.items()]

print(pairs)

namedtuple is only in there as a more concise version of your Object and the conversion back to pairs as a list of Pairs is only in there in case you don't like your result being the dictionary best_pairs.

Result:

[Pair(id=1, score=1), Pair(id=3, score=7), Pair(id=9, score=4)]

Comments

0

You can sort by score, convert to dict (so that max scores are dict values) and convert back to list of tuples:

class Object:
    def __init__(self, score):
        self.score = score
    def __repr__(self):
        return f'<Object {self.score}>'
    def __gt__(self, other):
        return self.score > other.score


pairs = [(1, Object(1)), (1, Object(1)), (3, Object(7)), (9, Object(4)), (9, Object(3))]
filtered_pairs = list(dict(sorted(pairs)).items())

1 Comment

To properly format code, you can use triple backticks - ``` - I've fixed it for you here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.