61

Is it possible to get which values are duplicates in a list using python?

I have a list of items:

    mylist = [20, 30, 25, 20]

I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].

Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.

2

15 Answers 15

88

These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer

If you just want to know the duplicates, use collections.Counter

from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]

If you need to know the indices,

from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
    D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}
Sign up to request clarification or add additional context in comments.

2 Comments

You could do this with the more compact [i for key in (key for key, count in Counter(mylist).items() if count > 1) for i, x in enumerate(mylist) if x == key] - although it's a bit of a monster, you might want to separate out the generator expression.
You could make def indices(seq, values):, return (i for value in values for i, x in enumerate(seq) if x == value), then do indices(mylist, (key for key, count in Counter(mylist).items() if count > 1). That's pretty neat (when not crammed into a comment).
20

Here's a list comprehension that does what you want. As @Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.

>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]

3 Comments

That's O(n^2)... You can do better.
@Levon, it does search the whole list
For those that don't understand what O(N^2) means: it means that for a 10 element list, you'll be executing 100 steps, for 1000 elements 1 milllion steps, for 1 million elements a million million steps, etc. Quadratic performance will kill your performance very rapidly.
11

You can use list compression and set to reduce the complexity.

my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]

Comments

7

The following list comprehension will yield the duplicate values:

[x for x in mylist if mylist.count(x) >= 2]

13 Comments

This gives the duplicate values, not their indices
"As you can see, in this list the duplicates are the first and last values. [0, 3]" seems to indicate the desired output.
@Swiss No, it isn't. A set comprehension only requires the curly braces, the brackets here are totally useless.
@Swiss I'm not a native speaker, I learned over time [ -> (square) braket, ( -> parenthesis, { -> (curly) braces in the US .. :)
Note that this has a terrible performance profile. list.count() is a O(N) job (all elements in the list are compared to count) and you are doing this in a loop over N elements, giving you quadratic performance, O(N^2). So for a 10-element list 100 steps are executed, for a 1000 element list 1 million, etc.
|
5

simplest way without any intermediate list using list.index():

z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']

and you can also list the duplicates itself (may contain duplicates again as in the example):

[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']

or their index:

[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]

or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:

[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]

or this together with the item itself:

[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]

or any other combination of elements and indices....

Comments

3

I tried below code to find duplicate values from list

1) create a set of duplicate list

2) Iterated through set by looking in duplicate list.

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        dup.append(c)
print(dup)

OUTPUT

[1, 'one']

Now get the all index for duplicate element

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        indices = [i for i, x in enumerate(glist) if x == c]
        dup.append((c,indices))
print(dup)

OUTPUT

[(1, [0, 6]), ('one', [3, 7])]

Hope this helps someone

Comments

2

That's the simplest way I can think for finding duplicates in a list:

my_list = [3, 5, 2, 1, 4, 4, 1]

my_list.sort()
for i in range(0,len(my_list)-1):
               if my_list[i] == my_list[i+1]:
                   print str(my_list[i]) + ' is a duplicate'

1 Comment

If items appear more than twice you'll print those multiple times.
1

The following code will fetch you desired results with duplicate items and their index values.

  for i in set(mylist):
    if mylist.count(i) > 1:
         print(i, mylist.index(i))

Comments

0

You should sort the list:

mylist.sort()

After this, iterate through it like this:

doubles = []
for i, elem in enumerate(mylist):
    if i != 0:
        if elem == old:
            doubles.append(elem)
            old = None
            continue
    old = elem

2 Comments

This doesn't get the indices of the items, which the asker appears to want. Also, creating an empty list and looping through items to append some is an anti-pattern in Python, use a list comprehension.
This too will print items that appear more than twice multiple times.
0

You can print duplicate and Unqiue using below logic using list.

def dup(x):
    duplicate = []
    unique = []
    for i in x:
        if i in unique:
            duplicate.append(i)
        else:
            unique.append(i)
    print("Duplicate values: ",duplicate)
    print("Unique Values: ",unique)

list1 = [1, 2, 1, 3, 2, 5]
dup(list1)

Comments

0
mylist = [20, 30, 25, 20]

kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }

print(kl)

Comments

0

It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:

dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each

# The second line produces an unused list. 
# It could be replaced with this:
for i, v in enumerate(mylist):
    dups.setdefault(v, []).append(i)

Comments

0

You could identify these items using the iteration_utilities library:

from iteration_utilities import duplicates
list(duplicates(mylist))

Output: [20]

Note that if 20 appeared 3 times in your original list, the output would instead be [20, 20].

Comments

-2
m = len(mylist)
for index,value in enumerate(mylist):
        for i in xrange(1,m):
                if(index != i):
                    if (L[i] == L[index]):
                        print "Location %d and location %d has same list-entry:  %r" % (index,i,value)

This has some redundancy that can be improved however.

Comments

-2
def checkduplicate(lists): 
 a = []
 for i in lists:
    if  i in a:
        pass   
    else:
        a.append(i)
 return i          
            
print(checkduplicate([1,9,78,989,2,2,3,6,8]))

1 Comment

This prints out the last value in the list. Even if you correct it to return a, that removes the duplicates, but the question was "is it possible to know what values are being duplicated"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.