Python methods to find duplicates

Question

Is there a way to find if a list contains duplicates. For example:

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates

@tyjkenn: Checking for existence of duplicates is simpler than finding the actual duplicates (which is what the other question is about). — interjay
– interjay, Commented Jun 28, 2012 at 17:30

3Doubloons · Accepted Answer · 2012-06-28 17:27:33Z

14

If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.

In code, it would look like this:

list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)

answered Jun 28, 2012 at 17:27

3Doubloons

2,10614 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

jdi Over a year ago

+1 for including some actual text to explain what you are doing as opposed to just plopping down code.

3Doubloons Over a year ago

@jdi: I actually tried to just plop down some code originally but it came under the 30 characters minimum.

FogleBird · Accepted Answer · 2012-06-28 17:48:40Z

2

Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.

>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates

edited Jun 28, 2012 at 17:48

answered Jun 28, 2012 at 17:27

FogleBird

77.3k25 gold badges133 silver badges136 bronze badges

Comments

Paul Seeb · Accepted Answer · 2012-06-29 19:56:58Z

2

Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates

list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]

if len(list1) != len(set(list1)):
    #duplicates

edited Jun 29, 2012 at 19:56

answered Jun 28, 2012 at 17:28

Paul Seeb

6,1963 gold badges30 silver badges38 bronze badges

Comments

lqc · Accepted Answer · 2012-06-28 17:43:57Z

0

The set() approach only works for hashable objects, so for completness, you could do it with just plain iteration:

import itertools

def has_duplicates(iterable):
    """
    >>> has_duplicates([1,2,3])
    False
    >>> has_duplicates([1, 2, 1])
    True
    >>> has_duplicates([[1,1], [3,2], [4,3]])
    False
    >>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
    True
    """
    return any(x == y for x, y in itertools.combinations(iterable, 2))

answered Jun 28, 2012 at 17:43

lqc

7,4081 gold badge28 silver badges27 bronze badges

4 Comments

Joel Cornett Over a year ago

Ouch. This one hurts for complexity. Better to write hash functions for your unhashable objects.

lqc Over a year ago

@JoelCornett Mind doing it for list ?

Joel Cornett Over a year ago

listHash = lambda x: hash(tuple(x)). Note that since this hash is just a one-time thing, you don't have to worry about objects mutating on you.

lqc Over a year ago

Here's a simpler one: lambda x: 1. Creating such a function doesn't make list objects any more hashable, 'cause list.__hash__ is still None. As for efficiency, you can easily tweak this to take constant extra memory. Hashing is just a CPU/memory tradeoff.

Collectives™ on Stack Overflow

Python methods to find duplicates

4 Answers 4

2 Comments

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related