Removing duplicates from list based on custom definition of duplicate

Question

I'm dealing with a nested list that looks something like this.

mylist =[
    ["First", "Second", "Third"], 
    ["First", "Second", "Third"], 
    ...
]

The goal is to remove duplicate elements of mylist based on the following definition: An element is equal to another element if element1[0] == element2[0] and element1[1] == element2[1]. Basically, only the first two elements count, ignore the rest.

This doesn't seem terribly hard but I'm probably over complicating it and having trouble with it. I think I am close to a solution, which I'll post if it gets done and nobody has answered.

My main problems:

I really wish I could turn the list to a set like in more conventional cases--is there any way to give set a custom definition of equivalence? A lot of built-in methods don't work because of that and rewriting them is a bit painful as the indexing always gets screwed up somewhere.

If you have the list [[1,2,4],[1,2,3]], do you care which of the the two survives the cull? — DSM
– DSM, Commented Jun 26, 2015 at 3:52

metatoaster · Accepted Answer · 2015-06-26 04:00:39Z

3

You can make a class that stores the data and override __eq__:

class MyListThingy(object):
    def __init__(self, data):
        self.data = data
    def __eq__(self, other):
        return self.data[0]==other.data[0] and self.data[1]==other.data[1]

Of course, this won't do any good for sets, which use hashing. for that you have to override __hash__:

def __hash__(self):
    return hash((self.data[0],self.data[1]))

edited Jun 26, 2015 at 4:00

metatoaster

19.2k5 gold badges65 silver badges74 bronze badges

answered Jun 26, 2015 at 3:47

aaazalea

8,1009 gold badges42 silver badges68 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Howcan Over a year ago

This sounds good. However, giving set() a list of MyListThingy objects raises an unhashable instance error (with the __hash function in the class).

aaazalea Over a year ago

Oops, I meant __hash__.

Howcan Over a year ago

Ah, works perfectly now, I thought you wanted to write hash as a private method (I think __ is used to denote that?). This is very nice solution that I'll keep in mind, thank you.

Ashwinee K Jha · Accepted Answer · 2015-06-26 04:01:48Z

2

You can create a tuple of first and second items from inner list to be used as a key in a dictionary. Then add all inner lists into the dictionary which will lead to removal of duplicates.

d = dict()
l =[["First", "Second", "Third"], ["First", "Second", "Fourth"]]
for item in l:
      d[(item[0], item[1])]=item

Output: ( d.values() )

[['First', 'Second', 'Fourth']]

answered Jun 26, 2015 at 4:01

Ashwinee K Jha

9,3572 gold badges27 silver badges20 bronze badges

Collectives™ on Stack Overflow

Removing duplicates from list based on custom definition of duplicate

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related