3

I have searched and haven't quite found the same question as mine. I want to remove duplicates from a list of lists in python; however, I don't care what order the values are in the list. They way I am doing it currently is too time-consuming.

What I want to do:

A = [[1,2,3] , [2,3,4] , [3,4,5] , [3,2,4]]

I want to search through A and remove all duplicates. The duplicates here would be [2,3,4] and [3,2,4]. This would reduce down to:

smaller_A = [[1,2,3] , [2,3,4], [3,4,5]]

How I am currently doing it:

todelete = []
for i in range(len(A)):
    for j in range(i+1,len(A)):
        if set(A[i]) == set(A[j]):
           todelete.append(j)

todelete = sorted(set(todelete))

smaller_A= [A[i] for i in range(len(A)) if i not in todelete]

Again, this works, but it is very time consuming when my lists are large. Any ideas? Thanks!

2 Answers 2

8

Frozensets are perfect for cases like this, when you need to nest sets:

>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> smaller_A = {frozenset(x) for x in A}
>>> smaller_A
{frozenset({1, 2, 3}), frozenset({2, 3, 4}), frozenset({3, 4, 5})}

To convert back to lists, you can do this:

>>> [list(x) for x in smaller_A]
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]

This won't conserve the order of your lists or the elements inside them. (Although it didn't make a difference here.)

If you do need to preserve order, you can iterate over A while keeping track of frozensets seen so far:

>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> seen = set()
>>> smaller_A = []
>>> for x in A:
...     if frozenset(x) not in seen:
...         smaller_A.append(x)
...         seen.add(frozenset(x))
...
>>> smaller_A
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]

(This isn't optimized; ideally, you'd only call frozenset(x) once and store the result in a variable.)

Sign up to request clarification or add additional context in comments.

2 Comments

This is great! I have not come across frozenset. I did need to preserve some sort of order, so I used the small loop you provided. However, I didn't quite understand what isn't optimized. Are you saying before the if statement I would have fs = frozenset(x), then use fs for the next two lines? Thanks!
For fun, this sped up my code (containing a list of 5040 lists) from 24.1s to .0067s! Wow! Props to flornquake!
1

you can do a trick with sorting this way

for i in range(len(A)): A[i].sort()

then remove duplicates

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.