2

I am trying to get a unique list of objects, I have some code that pulls data from an API and then puts that data into an object. I then put those objects in a list. however some of the objects are duplicates and I would like to know how to remove them?

sample list data:

[
Policy: 'SQL', 
SecondaryPolicy: 'ORACLE', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Fred', 
Mobile: '123', 

Policy: 'Comms', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bob', 
Mobile: '456', 

Policy: 'Infra', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bill', 
Mobile: '789', 

Policy: 'Comms', 
SecondaryPolicy: '', 
Level: 'Primary On Call Engineer',
LevelNo: 1, 
StartDate: None, 
EndDate: None, 
StartTime: None, 
EndTime: None, 
Name: 'Bob', 
Mobile: '456', 
]

code (ive removed some of the object data and put in sample data, for this test im just trying to get freds result returned once)

objPolicyData = getUserData()

OnCallData = [] 
for UserItem in objPolicyData['users']:   
    UserData = User()     
    #get the user object from DB
    UserData.Name   = 'Fred'
    for OnCall in UserItem['on_call']:    
        UserPolicy = OnCall['escalation_policy'] 
        UserData.Policy          = 'SQL'
        UserData.SecondaryPolicy = 'ORACLE'
        OnCallData.append(UserData)

attempts: i tried this

clean_on_call_data = {User.Name for User in OnCallData}

but this only prints

set(['Fred'])

where are the other fields in the objects, and how would i iterate it?

EDIT: this is my class, is the cmp correct? how do i remove the duplicate?

class User(object):
    __attrs = ['Policy','SecondaryPolicy','Name']

    def __init__(self, **kwargs):
        for attr in self.__attrs:
            setattr(self, attr, kwargs.get(attr, None))

    def __repr__(self):
        return ', '.join(
            ['%s: %r' % (attr, getattr(self, attr)) for attr in self.__attrs])  

    def __cmp__(self):     
        if self.Name != other.Name:  
1
  • What are you trying to achieve? Can you post maybe some sample output of what you expect? Commented May 26, 2016 at 16:18

3 Answers 3

2

For Python 2.x

I think you'll want to implement __cmp__ for your class that stores the API data.

For Python 3.x

I think you'll want to implement __eq__ and __hash__ for your class that stores the API data.

Regardless of which version of Python, you can use the comparator / eq method to check for duplicates in your list. This can be done by utilizing set(list), if you defined __eq__. As a set is a list of unique objects.

Sign up to request clarification or add additional context in comments.

5 Comments

I think you got that mixed up. Python 3 doesn't know __cmp__ anymore, instead you should implement at least __eq__ and __lt__. Python 2 is what had __cmp__.
@L3viathan sorry about that, was pulling from memory! Edited to reflect that.
ive edited the question to include the cmp function. have i implemnted it correctly. also the list stores the API data in classes, does that matter?
@GregHilston you would have to make sure that your objects are hashable though, right?
If you define your own __eq__ you lose the built in __hash__. So you don't technically have to implement your own __hash__, but if you don't, you won't have one. Basically, if you write your own __eq__, write your own __hash__ @FabianBosler
0

How about using dictionaries and then a pandas.DataFrame?

Something like:

d1 = {
'Policy': 'SQL', 
'SecondaryPolicy': 'ORACLE', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Fred', 
'Mobile': '123', 
}
d2 = {
'Policy': 'Comms', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bob', 
'Mobile': '456', 
}
d3 = {
'Policy': 'Infra', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bill', 
'Mobile': '789', 
}
d4 = {
'Policy': 'Comms', 
'SecondaryPolicy': '', 
'Level': 'Primary On Call Engineer',
'LevelNo': 1, 
'StartDate': None, 
'EndDate': None, 
'StartTime': None, 
'EndTime': None, 
'Name': 'Bob', 
'Mobile': '456', 
}


data = pd.DataFrame([d1,d2,d3,d4])

data[ data.Name=='Fred' ]

Which outs:

enter image description here

Comments

0

You could subclass the User class and implement __eq__ and __hash__ method, then just add those to a set, like this:

class UserUnique(User):
    def __hash__(self):
        return hash(self.Name)
    def __eq__(self, o):
        return self.Name == o.Name

Then you can do like this:

OnCallData = set()
for UserItem in objPolicyData['users']:   
    UserData = UserUnique()     
    UserData.Name = 'Fred'
    for OnCall in UserItem['on_call']:    
        UserPolicy = OnCall['escalation_policy'] 
        UserData.Policy = 'SQL'
        UserData.SecondaryPolicy = 'ORACLE'
        OnCallData.add(UserData)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.