efficiently knowing if intersection of two list is empty or not, in python [duplicate]

Question

Suppose I have two lists, L and M. Now I want to know if they share an element. Which would be the fastest way of asking (in python) if they share an element? I don't care which elements they share, or how many, just if they share or not.

For example, in this case

L = [1,2,3,4,5,6]
M = [8,9,10]

I should get False, and here:

L = [1,2,3,4,5,6]
M = [5,6,7]

I should get True.

I hope the question's clear. Thanks!

Manuel

See stackoverflow.com/questions/3170055/… for a more thorough analysis of this problem. not frozenset(L).isdisjoint(M) seems to be the optimal solution. — Edward Falk
– Edward Falk, Commented Apr 4, 2015 at 19:16

John La Rooy · Accepted Answer · 2010-02-04 06:22:14Z

66

Or more concisely

if set(L) & set(M):
    # there is an intersection
else:
    # no intersection

If you really need True or False

bool(set(L) & set(M))

After running some timings, this seems to be a good option to try too

m_set=set(M)
any(x in m_set  for x in L)

If the items in M or L are not hashable you have to use a less efficient approach like this

any(x in M for x in L)

Here are some timings for 100 item lists. Using sets is considerably faster when there is no intersection, and a bit slower when there is a considerable intersection.

M=range(100)
L=range(100,200)

timeit set(L) & set(M)
10000 loops, best of 3: 32.3 µs per loop

timeit any(x in M for x in L)
1000 loops, best of 3: 374 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
10000 loops, best of 3: 31 µs per loop

L=range(50,150)

timeit set(L) & set(M)
10000 loops, best of 3: 18 µs per loop

timeit any(x in M for x in L)
100000 loops, best of 3: 4.88 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
100000 loops, best of 3: 9.39 µs per loop


# Now for some random lists
import random
L=[random.randrange(200000) for x in xrange(1000)]
M=[random.randrange(200000) for x in xrange(1000)]

timeit set(L) & set(M)
1000 loops, best of 3: 420 µs per loop

timeit any(x in M for x in L)
10 loops, best of 3: 21.2 ms per loop

timeit m_set=set(M);any(x in m_set  for x in L)
1000 loops, best of 3: 168 µs per loop

timeit m_set=frozenset(M);any(x in m_set  for x in L)
1000 loops, best of 3: 371 µs per loop

edited Feb 4, 2010 at 6:22

answered Feb 4, 2010 at 5:28

John La Rooy

306k54 gold badges378 silver badges513 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Chris Lutz Over a year ago

@gnibbler - Is it provable that the any() version is less efficient? It seems like it would go through M only until it found an element in L, at which point any would return True and be done. This sounds more efficient than converting both L and M to sets beforehand. At least, on paper.

jathanism Over a year ago

This here, this is the answer.

John La Rooy Over a year ago

@Chris, worst case is when when there is no intersection - O(l*m). With sets i believe it is O(l+m)

Manuel Araoz Over a year ago

WOW! so bool(set(L) & set(M)) is faster than any(x in M for x in L)... Who would think? :) Thank you.

Chris Lutz Over a year ago

@Manuel - The best, it seems, is to convert one list to a set to allow for faster membership testing (in), then to filter based on this membership test (x in m_set for x in L). @gnibbler, can we get some tests that utilize two randomly constructed lists just for completeness? (and also +1 for a fine job)

|

Darius Bacon · Accepted Answer · 2010-02-04 07:25:29Z

6

To avoid the work of constructing the intersection, and produce an answer as soon as we know that they intersect:

m_set = frozenset(M)
return any(x in m_set for x in L)

Update: gnibbler tried this out and found it to run faster with set() in place of frozenset(). Whaddayaknow.

edited Feb 4, 2010 at 7:25

answered Feb 4, 2010 at 5:37

Darius Bacon

15.2k6 gold badges57 silver badges53 bronze badges

Comments

gahooa · Accepted Answer · 2010-02-04 05:26:56Z

3

First of all, if you do not need them ordered, then switch to the set type.

If you still need the list type, then do it this way: 0 == False

len(set.intersection(set(L), set(M)))

answered Feb 4, 2010 at 5:26

gahooa

138k14 gold badges101 silver badges104 bronze badges

2 Comments

Manuel Araoz Over a year ago

This doesn't seem very efficient. I mean, the whole intersection is been calculated, isn't it!? Or is it lazily evaluated? Thanks!

John La Rooy Over a year ago

@Manuel, when i tested it, the intersection took less time to calculate than the time converting the lists to sets, so less than 1/3 of the total time

tzot · Accepted Answer · 2023-06-01 12:21:45Z

Note: this answer seems to be too-complicated for what at first glance needs to be only a set operation, but sets can contain only hashable items; the original question does not specify what items will be in the list. So this code first tries with sets and then falls back to more generic code.

That's the most generic and efficient in a balanced way I could come up with (comments should make the code easy to understand):

import itertools, operator

def _compare_product(list1, list2):
    "Return if any item in list1 equals any item in list2 exhaustively"
    return any(
        itertools.starmap(
            operator.eq,
            itertools.product(list1, list2)))

def do_they_intersect(list1, list2):
    "Return if any item is common between list1 and list2"

    # do not try to optimize for small list sizes
    if len(list1) * len(list2) <= 100: # pick a small number
        return _compare_product(list1, list2)

    # first try to make a set from one of the lists
    try: a_set= set(list1)
    except TypeError:
        try: a_set= set(list2)
        except TypeError:
            a_set= None
        else:
            a_list= list1
    else:
        a_list= list2

    # here either a_set is None, or we have a_set and a_list

    if a_set:
        return any(itertools.imap(a_set.__contains__, a_list))
    
    # try to sort the lists
    try:
        a_list1= sorted(list1)
        a_list2= sorted(list2)
    except TypeError: # sorry, not sortable
        return _compare_product(list1, list2)

    # they could be sorted, so let's take the N+M road,
    # not the N*M
    
    iter1= iter(a_list1)
    iter2= iter(a_list2)
    try:
        item1= next(iter1)
        item2= next(iter2)
    except StopIteration: # one of the lists is empty
        return False # ie no common items

    while 1:
        if item1 == item2:
            return True
        while item1 < item2:
            try: item1= next(iter1)
            except StopIteration: return False
        while item2 < item1:
            try: item2= next(iter2)
            except StopIteration: return False

HTH.

Collectives™ on Stack Overflow

efficiently knowing if intersection of two list is empty or not, in python [duplicate]

4 Answers 4

7 Comments

Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

2 Comments

Comments

Linked

Related