2

I'm running a Python script which iterates two huge lists and finds the matching pairs.

However, it seems to take forever. How to speed up this script?

import sys
import random
import itertools

def main(args):
    target_num = int(999999999)
    num_list = range(1, target_num)
    rand_list = []
    hit_list = []

    for _ in itertools.repeat(None, target_num):
        rand_list.append(random.randint(1, target_num))

    for num in num_list:
        for rand_num in rand_list:
            if num == rand_num:
                print "hit"

if __name__ == "__main__":
    main(sys.argv[1:])
9
  • 1
    Try not to use built-in names as variable names in your code, it has the potential to cause frustration. By this I am referring to your use of list as a variable name Commented Apr 15, 2016 at 2:46
  • @Smac89 I was in a hurry to write a question, and made a mistake. I modified the variable name. Commented Apr 15, 2016 at 2:52
  • Is that second nested loop supposed to be read_list or rand_list? Commented Apr 15, 2016 at 3:02
  • @Smac89 It is supposed to be rand_list. Sorry for the confusion. Commented Apr 15, 2016 at 3:10
  • Also consider a list comprehension (rand_list = [random.randint(1, target_num) for _ in itertools.repeat(None, target_num)]). They are more efficient than using for loops as you don’t need to load the append attribute off of the list and call it as a function. Commented Apr 15, 2016 at 3:33

2 Answers 2

3

Use sets

import sys
import random
import itertools

def main(args):
    target_num = int(999999999)
    num_list = set(range(1, target_num))
    rand_list = []
    hit_list = []

    for _ in itertools.repeat(None, target_num):
        rand_list.append(random.randint(1, target_num))


    for num in rand_list:
        if num in num_list: # O(1)
            print "hit"

if __name__ == "__main__":
    main(sys.argv[1:])

Using a set for the first list, means that checking if the item is in that list is now reduced to an O(1)


As I am writing this, I realise you can even do better. The range function in python 3 returns a sequence, so you need python 3 for this next part

import sys
import random
import itertools

def main(args):
    target_num = int(999999999)
    num_list = range(1, target_num) # this is a generator
    rand_list = []
    hit_list = []

    for _ in itertools.repeat(None, target_num):
        rand_list.append(random.randint(1, target_num))


    for num in rand_list:
        if num in num_list: # Stil O(1)
            print ("hit")

if __name__ == "__main__":
    main(sys.argv[1:])

Even better, using range and do the checking within the first loop?

for _ in itertools.repeat(None, target_num):
    rand_num = random.randint(1, target_num)
    rand_list.append(rand_num)
    if rand_num in num_list:
        print ("hit")
Sign up to request clarification or add additional context in comments.

1 Comment

O(1) membership testing is not a feature of xrange, it was only added to range in Python 3.2 when they finally finished implementing the Sequence ABC for range. xrange membership testing generates and tests values one at a time until it gets a hit. Also, neither range nor xrange are generators; they're more or less Sequence objects that can produce iterators, not iterators/generators themselves; if they were generators, iterating them once would exhaust them, but that's not how they behave.
1

If using Python 2, use xrange(), which returns a generator-like object.

# requires Python 2
import random

target_num = 99 # 999999999 are too much items for testing

# target_num random numbers in range 1 .. target_num-1
random_numbers = set(random.randint(1, target_num) for _ in xrange(target_num)) 

hits = set()
for num in xrange(1, target_num):  # check for all numbers in range 1 .. target_num-1
    if num in random_numbers:   # num in set() is O(1)
        hits.add(num)

if len(random_numbers - hits) == 0:
     print "all random numbers are hits!"

# so:
for num in random_numbers:
    print num
# is the same result

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.