0

I hacked up some code to test my hypothesis of algo complexity for set intersection in python:

s3, s4, s5, s6, t3, t4, t5, t6 are all defined and are rather large - their intersections also happen to be quite large.

import timeit

s3, t3 = set(), set()
s4, t4 = set(), set()
s5, t5 = set(), set()
s6, t6 = set(), set()

for x in xrange(int(1e3)):
    s3.add(x)
    t3.add(x*2 + x)

for x in xrange(int(1e4)):
    s4.add(x)
    t4.add(x*2 + x)

for x in xrange(int(1e5)):
    s5.add(x)
    t5.add(x*2 + x)

for x in xrange(int(1e6)):
    s6.add(x)
    t6.add(x*2 + x)


def _test():
    for i in [3, 4, 5, 6]:
        for j in [3, 4, 5, 6]:
            if i >= j:
                s, t = 's' + str(i), 't' + str(j)
                print i, j
                print timeit.timeit('{0}.intersection({1})'.format(s, t), setup="from __main__ import {0}, {1}".format(s, t))
        eval('del s' + str(i))

But the following statement 'blows up'..

eval('del s' + str(i))

Any ideas?

I would also be interested in any ideas to make my hacky code less hacky.

Thanks

1
  • 1
    what is "blows up"? Commented Mar 13, 2017 at 11:33

1 Answer 1

2

First of all, you should be aware that the slow execution time you are seeing is due to timeit. The algorithm itself is super fast.

Secondly you can create list of sets:

s = [set(), set(), ...]

and then delete the element from that list

del s[i]

in your case:

import timeit

s = [set(), set(), set(), set()]
t = [set(), set(), set(), set()]

for x in xrange(int(1e3)):
    s[0].add(x)
    t[0].add(x*2 + x)

for x in xrange(int(1e4)):
    s[1].add(x)
    t[1].add(x*2 + x)

for x in xrange(int(1e5)):
    s[2].add(x)
    t[2].add(x*2 + x)

for x in xrange(int(1e6)):
    s[3].add(x)
    t[3].add(x*2 + x)


def _test():
    for i, _ in enumerate(s):
        for j, _ in enumerate(t):
            if i >= j:
                print i, j
                print timeit.timeit('s[{0}].intersection(t[{1}])'.format(i, j), setup="from __main__ import s, t")
        del s[i]

Thirdly for larger datasets you can probably speed up the process using NumPy:

import timeit
import numpy
import itertools

s = [numpy.arange(l) for l in [1e3, 1e4, 1e5, 1e6]]
t = [ss * 3 for ss in s]  # x*2 + x == x*3

for (i, ss), (j, tt) in itertools.product(enumerate(s), enumerate(t)):
    if i >= j:
        print i, j
        numpy.intersect1d(ss, tt, assume_unique=True)
Sign up to request clarification or add additional context in comments.

4 Comments

Yep, that makes everything better. Thanks
You can also create another list for xrange sizes, so there aren't four for loops anymore.
and you could use 'for i in range(len(s))' instead of enumerate. Thanks alot :).
I have added another example using NumPy, if performance is a concern.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.