Take the intersection of an arbitrary number of lists in python

Question

Suppose I have a list of lists of elements which are all the same (i'll use ints in this example)

[range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]

What would be a nice and/or efficient way to take the intersection of these lists (so you would get every element that is in each of the lists)? For the example that would be:

[0, 12, 24, 36, 48, 60, 72, 84, 96]

Mike Graham · Accepted Answer · 2010-05-23 21:31:10Z

9

Use sets, which have an intersection method.

>>> s = set()
>>> s.add(4)
>>> s.add(5)
>>> s
set([4, 5])
>>> t = set([2, 4, 9])
>>> s.intersection(t)
set([4])

For your example, something like

>>> data = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, data)
>>> print set.intersection(*sets)
set([0, 96, 36, 72, 12, 48, 84, 24, 60])

answered May 23, 2010 at 21:31

Mike Graham

77.2k16 gold badges105 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

thepandaatemyface Over a year ago

I'll put this as the best answer, because it's a little faster than my own (which is in turn twice as fast as the ones using reduce) and because the neat thing with the multiple sets at once. Thanks!

David Z Over a year ago

Alternatively, set.intersection(set(x) for x in data)

Mike Graham Over a year ago

@thepandaatemyface, I'm always glad to hear my code performs well, but always a bit suspicious as well. I'm sure it depends on the input a lot and you don't have time to have ran it on truly huge input if size of input was the issue. If I was trying to optimize for speed in an inner loop on large data, I would consider trying set(datas[0]).intersection(*datas[1:]) out and timing it, which has a nice ring of performance to me.

thepandaatemyface Over a year ago

@Mike Graham, what I meant to say is: of all the elegant solutions posted here, yours was the fastest. I quickly tested it with a [[randint(0, 100000) for i in range(1000)] for i in range(100)] as my data. It's not very scientific, but it seems to keep giving yours as the fastest.

Mike Graham Over a year ago

@David, you're missing a *(...) to apply the generator's items as args. Other than that, that's certainly a fine approach. The main reason I didn't use it was to emphasize that if you're doing operations like intersection, you should probably already have sets.

|

Dan Loewenherz · Accepted Answer · 2010-05-23 21:27:48Z

4

I think the built-in set module should do the trick.

>>> elements = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, elements)
>>> result = list(reduce(lambda x, y: x & y, sets))
>>> print result
[0, 96, 36, 72, 12, 48, 84, 24, 60]

answered May 23, 2010 at 21:27

Dan Loewenherz

11.3k7 gold badges54 silver badges85 bronze badges

5 Comments

Samir Talwar Over a year ago

You beat me to the punch. I'll leave my answer up as it applies reduce slightly differently, but I'm glad to see that other people think functionally too. :-)

Paul McMillan Over a year ago

Too many schools skip the (imho mandatory) introductory functional programming course by skipping straight to Java. Come on guys, SCIP is about the best introductory CS book ever...

Mike Graham Over a year ago

Note that set is a type, not a module. (The set type used to be in a module called sets, but it is long deprecated.)

thepandaatemyface Over a year ago

although it's very elegant and works just fine, this seems to be twice as slow as the solutions not using reduce. Anyone have any ideas why?

Mike Graham Over a year ago

It could be due to having to build more intermediary sets.

Samir Talwar · Accepted Answer · 2010-05-23 21:28:14Z

3

Convert them to sets and use the set.intersection method, reducing over the list of sets:

xs = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
reduce(set.intersection, [set(x) for x in xs])

reduce is a functional programming device that iterates through any iterable and applies the function provided to the first two elements, then to the result and the next, and then the result of that and the next, and so on.

answered May 23, 2010 at 21:28

Samir Talwar

14.4k3 gold badges45 silver badges65 bronze badges

2 Comments

Mike Graham Over a year ago

set.intersection takes an arbitrary number of iterables as arguments (in recent Pythons). If I'm not mistaken, this can be implemented with better algorithmic complexity than the reduce method provides.

Samir Talwar Over a year ago

@Mike: That's brilliant. I had no idea.

thepandaatemyface · Accepted Answer · 2010-05-23 21:30:05Z

1

I'm going to answer my own question:

lists =  [range(100)[::4],range(100)[::3],range(100)[::2],range(100)[::1]]

out = set(lists[0])
for l in lists[1:]:
    out = set(l).intersection(out)

print out

or

print list(out)

answered May 23, 2010 at 21:30

thepandaatemyface

5,3076 gold badges27 silver badges31 bronze badges

Comments

Zev Averbach · Accepted Answer · 2017-05-18 20:16:47Z

1

Here's a one-liner using the good old all() built-in function:

list(num for num in data[0] 
     if all(num in range_ for range_ in data[1:]))

Interestingly, this is (I think) more readable and faster than using set for larger data sets.

edited May 18, 2017 at 20:16

answered May 18, 2017 at 19:59

Zev Averbach

1,1441 gold badge11 silver badges25 bronze badges

Comments

inspectorG4dget · Accepted Answer · 2010-05-23 21:42:37Z

0

l = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
l = [set(i) for i in l]
intersect = l[0].intersection(l[1])
for i in l[2:]:
    intersect = intersect.intersection(i)

edited May 23, 2010 at 21:42

answered May 23, 2010 at 21:29

inspectorG4dget

115k30 gold badges159 silver badges253 bronze badges

6 Comments

Mike Graham Over a year ago

This would raise NameError ?

inspectorG4dget Over a year ago

I don't think so, @MikeGraham. Perhaps you are referring to the code that was here before I edited. I ran the old code and got an error, but this code has been tested and woks fine

Mike Graham Over a year ago

@inspectG4dget, I was referring to the code before it was edited (my comment appears at least as old as the edit?) This code will not exibit that error, though I must confess I find your design a bit odd.

inspectorG4dget Over a year ago

@MikeGraham: I saw the timing of the edit and the comment, which is why I suggested that your comment may have been posted before the edit. But I am curious as to why and how you would change the design of this.

inspectorG4dget Over a year ago

@MikeGraham: Very true. I had thought of doing this, but I wanted to be more transparent with my code - especially since I did not document it at all. I don't know how proficient @thepandaatemyface is and therefore wanted to keep this as simple as possible.

|

Andrew Jaffe · Accepted Answer · 2010-05-23 22:09:47Z

0

You can treat them as sets and use set.intersection():

lists = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
sets = [set(l) for l in lists]

isect = reduce(lambda x,y: x.intersection(y), sets)

edited May 23, 2010 at 22:09

answered May 23, 2010 at 21:28

Andrew Jaffe

27.2k4 gold badges54 silver badges59 bronze badges

2 Comments

Mike Graham Over a year ago

This would raise AttributeError?

Andrew Jaffe Over a year ago

Oops, intersect -> intersection (fixed).

Collectives™ on Stack Overflow

Take the intersection of an arbitrary number of lists in python

7 Answers 7

6 Comments

5 Comments

2 Comments

Comments

Comments

6 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

6 Comments

5 Comments

2 Comments

Comments

Comments

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related