4

Suppose I have a list of lists of elements which are all the same (i'll use ints in this example)

[range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]

What would be a nice and/or efficient way to take the intersection of these lists (so you would get every element that is in each of the lists)? For the example that would be:

[0, 12, 24, 36, 48, 60, 72, 84, 96]

7 Answers 7

9

Use sets, which have an intersection method.

>>> s = set()
>>> s.add(4)
>>> s.add(5)
>>> s
set([4, 5])
>>> t = set([2, 4, 9])
>>> s.intersection(t)
set([4])

For your example, something like

>>> data = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, data)
>>> print set.intersection(*sets)
set([0, 96, 36, 72, 12, 48, 84, 24, 60])
Sign up to request clarification or add additional context in comments.

6 Comments

I'll put this as the best answer, because it's a little faster than my own (which is in turn twice as fast as the ones using reduce) and because the neat thing with the multiple sets at once. Thanks!
Alternatively, set.intersection(set(x) for x in data)
@thepandaatemyface, I'm always glad to hear my code performs well, but always a bit suspicious as well. I'm sure it depends on the input a lot and you don't have time to have ran it on truly huge input if size of input was the issue. If I was trying to optimize for speed in an inner loop on large data, I would consider trying set(datas[0]).intersection(*datas[1:]) out and timing it, which has a nice ring of performance to me.
@Mike Graham, what I meant to say is: of all the elegant solutions posted here, yours was the fastest. I quickly tested it with a [[randint(0, 100000) for i in range(1000)] for i in range(100)] as my data. It's not very scientific, but it seems to keep giving yours as the fastest.
@David, you're missing a *(...) to apply the generator's items as args. Other than that, that's certainly a fine approach. The main reason I didn't use it was to emphasize that if you're doing operations like intersection, you should probably already have sets.
|
4

I think the built-in set module should do the trick.

>>> elements = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
>>> sets = map(set, elements)
>>> result = list(reduce(lambda x, y: x & y, sets))
>>> print result
[0, 96, 36, 72, 12, 48, 84, 24, 60]

5 Comments

You beat me to the punch. I'll leave my answer up as it applies reduce slightly differently, but I'm glad to see that other people think functionally too. :-)
Too many schools skip the (imho mandatory) introductory functional programming course by skipping straight to Java. Come on guys, SCIP is about the best introductory CS book ever...
Note that set is a type, not a module. (The set type used to be in a module called sets, but it is long deprecated.)
although it's very elegant and works just fine, this seems to be twice as slow as the solutions not using reduce. Anyone have any ideas why?
It could be due to having to build more intermediary sets.
3

Convert them to sets and use the set.intersection method, reducing over the list of sets:

xs = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
reduce(set.intersection, [set(x) for x in xs])

reduce is a functional programming device that iterates through any iterable and applies the function provided to the first two elements, then to the result and the next, and then the result of that and the next, and so on.

2 Comments

set.intersection takes an arbitrary number of iterables as arguments (in recent Pythons). If I'm not mistaken, this can be implemented with better algorithmic complexity than the reduce method provides.
@Mike: That's brilliant. I had no idea.
1

I'm going to answer my own question:

lists =  [range(100)[::4],range(100)[::3],range(100)[::2],range(100)[::1]]

out = set(lists[0])
for l in lists[1:]:
    out = set(l).intersection(out)

print out

or

print list(out)

Comments

1

Here's a one-liner using the good old all() built-in function:

list(num for num in data[0] 
     if all(num in range_ for range_ in data[1:]))

Interestingly, this is (I think) more readable and faster than using set for larger data sets.

Comments

0
l = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
l = [set(i) for i in l]
intersect = l[0].intersection(l[1])
for i in l[2:]:
    intersect = intersect.intersection(i)

6 Comments

This would raise NameError ?
I don't think so, @MikeGraham. Perhaps you are referring to the code that was here before I edited. I ran the old code and got an error, but this code has been tested and woks fine
@inspectG4dget, I was referring to the code before it was edited (my comment appears at least as old as the edit?) This code will not exibit that error, though I must confess I find your design a bit odd.
@MikeGraham: I saw the timing of the edit and the comment, which is why I suggested that your comment may have been posted before the edit. But I am curious as to why and how you would change the design of this.
@MikeGraham: Very true. I had thought of doing this, but I wanted to be more transparent with my code - especially since I did not document it at all. I don't know how proficient @thepandaatemyface is and therefore wanted to keep this as simple as possible.
|
0

You can treat them as sets and use set.intersection():

lists = [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]]
sets = [set(l) for l in lists]

isect = reduce(lambda x,y: x.intersection(y), sets)

2 Comments

This would raise AttributeError?
Oops, intersect -> intersection (fixed).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.