0

I want to efficiently find the intersection of two lists , keeping duplicates from both, e.g. A=[1,1,2,3], B=[1,1,2,4] should return [1,1,1,1,2]

I know a similar question was asked previously (Python intersection of two lists keeping duplicates) however this does not help me because only the duplicates from one list are retained.

The following works

def intersect(A,B):
    C=[]
    for a in A:
        for b in B:
            if a==b:
                C.append(a)
    return C

however it isn't efficient enough for what I'm doing! To speed things up I tried sorting the lists

def intersect(A,B):
    A.sort()
    B.sort()
    C=[]
    i=0
    j=0
    while i<len(A) and j<len(B):
        if A[i]<=B[j]:
            if A[i]==B[j]: 
                C.append(A[i])
            i+=1
        else:
            j=j+1
    return C

however this only keeps the duplicates from list B. Any suggestions?

7
  • Just to clarify by "intersection of the lists" here you mean "if an item occurs N times in list A and M times in list B, it should occur N+M times in the new list, but if it either N or M is zero then it should occur zero times in the new list"? (That is an unusual use of the term "intersection".) Commented Apr 23, 2015 at 17:16
  • 4
    You have 4 1's in the result because there are 2 in A and 2 in B, so why does the result not have two 2s? Commented Apr 23, 2015 at 17:18
  • 1
    I realized my above interpretation is incorrect based on your data. Why do you have only one 2 in the example result, but four 1s? Commented Apr 23, 2015 at 17:18
  • 1
    The reason there are four 1s there is because each 1 in list A is matched with the 1s in list B, whereas the 2 in list A is matched uniquely to the 2 in list B. Another example is A=[1,1,2,3], B=[1,2,4] should return [1,1,2] Commented Apr 23, 2015 at 17:52
  • 1
    Are you working with large lists, or are you working with small lists but many intersections? Are there many repeating elements in your lists? Please show some examples for the time. Commented Apr 23, 2015 at 18:24

3 Answers 3

3

Here is the answer to your question as asked:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A) & set(B)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output

Here is the answer to the question as originally interpreted by myself and two others:

import collections
A=[1,1,2,3]
B=[1,1,2,4]
expected_output = [1,1,1,1,2,2]
cntA = collections.Counter(A)
cntB = collections.Counter(B)
cnt_sum = collections.Counter(A) + collections.Counter(B)
output = [x for x in sorted(set(A) & set(B)) for i in range(cnt_sum[x])]
assert output == expected_output

You can find the collections.Counter() documentation here. collections is a great module and I highly recommend giving the documentation on the whole module a read.

I realized you don't actually need to find the intersection of the sets, because the "count of a missing element is zero" according to the documentation:

import collections
for A,B,expected_output in (
    ([1,1,2,3], [1,1,2,4], [1,1,1,1,2]),
    ([1,1,2,3], [1,2,4], [1,1,2])):
    cntA = collections.Counter(A)
    cntB = collections.Counter(B)
    output = [
        x for x in sorted(set(A)) for i in range(cntA[x]*cntB[x])]
    assert output == expected_output
Sign up to request clarification or add additional context in comments.

8 Comments

I want the code to do the same job as the first piece of code in the question but more efficiently, the expected output is [1,1,1,1,2]
The top one works exactly as I expect it to and has drastically sped up my code. Thank you!
cnt_sum is redundant in the first one
@EllaShar Yes, definitely do xrange instead of range if Python2. I assumed Python3, but question just tagged Python.
@EllaShar: As a rule of thumb, it's preferable to provide an answer that works with both Python 2 and Python 3, especially when the version is not stated or tagged in the question.
|
2

How about this:

a_set = set(A)
b_set = set(B)
intersect = [i for i in A if i in b_set] + [j for j in B if j in a_set]

Two list comprehensions concatenated. A bit of extra time and memory is used to create sets of A and B, but that will be more than offset by the efficiency of checking membership of items in a set vs list.

You could also spruce it up a bit:

set_intersect = set(A) & set(B)
list_intersect = [ele for ele in A+B if ele in set_intersect]

Coerce both lists to sets, take their intersection, then use a list comprehension to add all elements from both lists A and B if they appear in the set intersection.

1 Comment

this gives [1,1,1,1,2,2] for the example I gave, not [1,1,1,1,2]
0

I'm having a hard time speeding your code since I don't know what are you running it on. It makes a lot of difference whether you run it on small or large lists and how many distinct elements are there. Anyway, here are some suggestions:

1.

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = []
    for i in count_a:
        count_mul.extend([i] * (count_a[i] * count_b[i]))
    return count_mul

2.

This returns an iterator, you can use list(iterator) to turn it into a list

def intersect(a, b):
    count_a = Counter(a)
    count_b = Counter(b)
    count_mul = Counter()
    for i in count_a:
        count_mul[i] += count_a[i] * count_b[i]
    return count_mul.elements()

3.

Very similar to your way but without changing the size of the list which takes time.

def intersect(A, B):
    return [a for a in A for b in B if a == b]

I'm not sure this gives any improvement to your original way, it really depends on the inputs, but your way is O(n*m) and mine is O(n+m).

You can use the module timeit to check how fast it runs on your input:

from timeit import timeit
timeit('test.intersect(A, B)', 'import test; A = [1,1,2,3]; B = [1,1,2,4]')

1 Comment

the inputs will be tuples of length 3 consisting of large integers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.