1

I am using the below code for a dictionary of like 100,000 keys and values...I wanted to make it more faster by doing multiprocessing/multithreading since each loop is independent of another loop. Can anyone tell me how to apply and which one (multiprocessing/multithreading) is more apt for this kind of approach

from urlparse import urlparse

ProcessAllURLs(URLs)

ProcessAllURLs(URLs)
def ProcessAllURLs(URLs):
    for eachurl in URLs:
            x=urlparse(eachurl)
            print eachurl.netloc

Thanks

2 Answers 2

1

I would recommend Python's multiprocessing library. In particular, study the section labeled "Using a pool of workers". It should be pretty quick to rework the above code so that it uses all available cores of your system.

One tip, though: Don't print URLs from the pool workers. It is better to pass back the answer to the main process and aggregate them there for printing. Printing from different processes will result in a lot of jumbled, uncoordinated console output.

Sign up to request clarification or add additional context in comments.

Comments

1

The multiprocessing library is probably best for your example. It looks like your code could be rewritten to be:

from urlparse import urlparse

nprocs = 2 # nprocs is the number of processes to run
ParsePool = Pool(nprocs)
ParsedURLS = ParsePool.map(urlparse,URLS)

The map function is the same as the built-in map function, but runs a separate process for each function call.

See http://docs.python.org/library/multiprocessing.html for more on multiprocessing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.