Identify duplicate values in a list in Python

Question

Is it possible to get which values are duplicates in a list using python?

I have a list of items:

    mylist = [20, 30, 25, 20]

I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].

Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.

possible duplicate of How to find duplicate elements in array using for loop in python like c/c++? — Yatharth Agarwal
– Yatharth Agarwal, Commented Jul 17, 2013 at 14:51
Possible duplicate of Find and list duplicates in Python list — Anderson Green
– Anderson Green, Commented Sep 10, 2016 at 3:57

John La Rooy · Accepted Answer · 2012-06-27 23:17:43Z

88

These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer

If you just want to know the duplicates, use collections.Counter

from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]

If you need to know the indices,

from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
    D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}

edited Jun 27, 2012 at 23:17

answered Jun 27, 2012 at 23:11

John La Rooy

306k54 gold badges378 silver badges513 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gareth Latty Over a year ago

You could do this with the more compact [i for key in (key for key, count in Counter(mylist).items() if count > 1) for i, x in enumerate(mylist) if x == key] - although it's a bit of a monster, you might want to separate out the generator expression.

Gareth Latty Over a year ago

You could make def indices(seq, values):, return (i for value in values for i, x in enumerate(seq) if x == value), then do indices(mylist, (key for key, count in Counter(mylist).items() if count > 1). That's pretty neat (when not crammed into a comment).

Junuxx · Accepted Answer · 2012-06-27 23:11:35Z

20

Here's a list comprehension that does what you want. As @Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.

>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]

answered Jun 27, 2012 at 23:11

Junuxx

14.3k5 gold badges43 silver badges74 bronze badges

3 Comments

JBernardo Over a year ago

That's O(n^2)... You can do better.

John La Rooy Over a year ago

@Levon, it does search the whole list

Martijn Pieters Over a year ago

For those that don't understand what O(N^2) means: it means that for a 10 element list, you'll be executing 100 steps, for 1000 elements 1 milllion steps, for 1 million elements a million million steps, etc. Quadratic performance will kill your performance very rapidly.

ramchauhan · Accepted Answer · 2015-11-07 17:52:19Z

11

You can use list compression and set to reduce the complexity.

my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]

answered Nov 7, 2015 at 17:52

ramchauhan

2482 silver badges7 bronze badges

Comments

octopusgrabbus · Accepted Answer · 2012-07-15 00:22:05Z

7

The following list comprehension will yield the duplicate values:

[x for x in mylist if mylist.count(x) >= 2]

edited Jul 15, 2012 at 0:22

octopusgrabbus

10.7k15 gold badges75 silver badges137 bronze badges

answered Jun 27, 2012 at 23:13

Swiss

5,9091 gold badge32 silver badges46 bronze badges

13 Comments

Junuxx Over a year ago

This gives the duplicate values, not their indices

Junuxx Over a year ago

"As you can see, in this list the duplicates are the first and last values. [0, 3]" seems to indicate the desired output.

Gareth Latty Over a year ago

@Swiss No, it isn't. A set comprehension only requires the curly braces, the brackets here are totally useless.

Levon Over a year ago

@Swiss I'm not a native speaker, I learned over time [ -> (square) braket, ( -> parenthesis, { -> (curly) braces in the US .. :)

Martijn Pieters Over a year ago

Note that this has a terrible performance profile. list.count() is a O(N) job (all elements in the list are compared to count) and you are doing this in a loop over N elements, giving you quadratic performance, O(N^2). So for a 10-element list 100 steps are executed, for a 1000 element list 1 million, etc.

|

JoeX · Accepted Answer · 2016-07-21 13:01:38Z

simplest way without any intermediate list using list.index():

z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']

and you can also list the duplicates itself (may contain duplicates again as in the example):

[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']

or their index:

[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]

or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:

[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]

or this together with the item itself:

[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]

or any other combination of elements and indices....

Rohan Khude · Accepted Answer · 2017-01-03 11:32:59Z

3

I tried below code to find duplicate values from list

1) create a set of duplicate list

2) Iterated through set by looking in duplicate list.

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        dup.append(c)
print(dup)

OUTPUT

[1, 'one']

Now get the all index for duplicate element

glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
    if(glist.count(c)>1):
        indices = [i for i, x in enumerate(glist) if x == c]
        dup.append((c,indices))
print(dup)

OUTPUT

[(1, [0, 6]), ('one', [3, 7])]

Hope this helps someone

edited Jan 3, 2017 at 11:32

answered Jan 3, 2017 at 10:30

Rohan Khude

5,0015 gold badges52 silver badges48 bronze badges

Comments

Andreampa · Accepted Answer · 2013-12-13 15:37:10Z

2

That's the simplest way I can think for finding duplicates in a list:

my_list = [3, 5, 2, 1, 4, 4, 1]

my_list.sort()
for i in range(0,len(my_list)-1):
               if my_list[i] == my_list[i+1]:
                   print str(my_list[i]) + ' is a duplicate'

answered Dec 13, 2013 at 15:37

Andreampa

2331 gold badge3 silver badges10 bronze badges

1 Comment

Martijn Pieters Over a year ago

If items appear more than twice you'll print those multiple times.

Ashish Srivastava · Accepted Answer · 2017-03-29 22:05:28Z

1

The following code will fetch you desired results with duplicate items and their index values.

  for i in set(mylist):
    if mylist.count(i) > 1:
         print(i, mylist.index(i))

answered Mar 29, 2017 at 22:05

Ashish Srivastava

1241 silver badge5 bronze badges

Comments

Sven Hager · Accepted Answer · 2012-06-27 23:13:52Z

0

You should sort the list:

mylist.sort()

After this, iterate through it like this:

doubles = []
for i, elem in enumerate(mylist):
    if i != 0:
        if elem == old:
            doubles.append(elem)
            old = None
            continue
    old = elem

answered Jun 27, 2012 at 23:13

Sven Hager

3,2145 gold badges26 silver badges34 bronze badges

2 Comments

Gareth Latty Over a year ago

This doesn't get the indices of the items, which the asker appears to want. Also, creating an empty list and looping through items to append some is an anti-pattern in Python, use a list comprehension.

Martijn Pieters Over a year ago

This too will print items that appear more than twice multiple times.

Aashutosh Kumar · Accepted Answer · 2019-08-06 22:31:01Z

0

You can print duplicate and Unqiue using below logic using list.

def dup(x):
    duplicate = []
    unique = []
    for i in x:
        if i in unique:
            duplicate.append(i)
        else:
            unique.append(i)
    print("Duplicate values: ",duplicate)
    print("Unique Values: ",unique)

list1 = [1, 2, 1, 3, 2, 5]
dup(list1)

edited Aug 6, 2019 at 22:31

answered Aug 6, 2019 at 22:25

Aashutosh Kumar

82911 silver badges16 bronze badges

Comments

Joe Ferndz · Accepted Answer · 2021-01-11 06:30:14Z

0

mylist = [20, 30, 25, 20]

kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }

print(kl)

edited Jan 11, 2021 at 6:30

Joe Ferndz

8,5282 gold badges15 silver badges37 bronze badges

answered Sep 21, 2020 at 6:27

Piyush

1

Comments

Matthias Fripp · Accepted Answer · 2021-02-25 06:53:22Z

0

It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:

dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each

# The second line produces an unused list. 
# It could be replaced with this:
for i, v in enumerate(mylist):
    dups.setdefault(v, []).append(i)

answered Feb 25, 2021 at 6:53

Matthias Fripp

18.9k5 gold badges36 silver badges49 bronze badges

Comments

KBurchfiel · Accepted Answer · 2024-06-25 01:35:53Z

0

You could identify these items using the iteration_utilities library:

from iteration_utilities import duplicates
list(duplicates(mylist))

Output: [20]

Note that if 20 appeared 3 times in your original list, the output would instead be [20, 20].

answered Jun 25, 2024 at 1:35

KBurchfiel

9741 gold badge9 silver badges19 bronze badges

Comments

Amicable · Accepted Answer · 2014-04-09 08:12:23Z

-2

m = len(mylist)
for index,value in enumerate(mylist):
        for i in xrange(1,m):
                if(index != i):
                    if (L[i] == L[index]):
                        print "Location %d and location %d has same list-entry:  %r" % (index,i,value)

This has some redundancy that can be improved however.

edited Apr 9, 2014 at 8:12

Amicable

3,1113 gold badges54 silver badges80 bronze badges

answered Apr 9, 2014 at 7:53

Anon

1

Comments

Zephyr · Accepted Answer · 2021-02-25 07:17:30Z

-2

def checkduplicate(lists): 
 a = []
 for i in lists:
    if  i in a:
        pass   
    else:
        a.append(i)
 return i          
            
print(checkduplicate([1,9,78,989,2,2,3,6,8]))

edited Feb 25, 2021 at 7:17

Zephyr

12.6k91 gold badges53 silver badges92 bronze badges

answered Feb 25, 2021 at 6:36

Ramyashree S

12 bronze badges

1 Comment

Gino Mempin Over a year ago

This prints out the last value in the list. Even if you correct it to return a, that removes the duplicates, but the question was "is it possible to know what values are being duplicated"

Collectives™ on Stack Overflow

Identify duplicate values in a list in Python

15 Answers 15

2 Comments

3 Comments

Comments

13 Comments

Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

15 Answers 15

2 Comments

3 Comments

Comments

13 Comments

Comments

Comments

1 Comment

Comments

2 Comments

Comments

Comments

Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related