0

I want to use it for reading values in 17770 files and to add all of them at the end to one dictionary object. I have a machine with 8 cores.

This is the code

def run_item_preprocess ():
    scores = {};
    for i in range(1,17771):
        filename1 = "x_" + str(i) + ".txt";
        lines1 = open(filename1).readlines();
        Item1 = {};
        for line in lines1:
            tokens = line.split(',');
            Item1[int(tokens[1])] = int(tokens[2]);
        for j in range(1,17771):
            if j == i:
                continue;
            filename2 = "x_" + str(i) + ".txt";
            lines2 = open(filename2).readlines();
            Item2 = {};
            for line in lines2:
                tokens = line.split(',');
                u = int(tokens[1]);
                r = int(tokens[2]);
                if u in Item1:
                    Item2[u] = r;
            if i not in scores:
                scores[i] = {};
            scores[i]= (s(Item1,Item2),j);
10
  • I am looking for someone who could help me in that ... I posted my code Commented Dec 1, 2010 at 0:10
  • Am I reading this wrong? You're trying to open each of your 17770 files 17770 times? Commented Dec 1, 2010 at 0:16
  • 3
    Before you think about making it run in multiple threads, you need to make sure the code runs correctly as a single thread. I seriously doubt that this code does what you want it to do. (315772900 file reads?) Commented Dec 1, 2010 at 0:17
  • yes ... I have to do that ... and all the files together are 2.3 Gb Commented Dec 1, 2010 at 0:17
  • Yes, but opening files 300 million times is not the answer. A database might help. I'm trying to work out what you're actually doing with the code. Commented Dec 1, 2010 at 0:21

3 Answers 3

4

Here is the wonderful multiprocessing module. It lets you parallelise code, using processes not threads. This will use all cores.

An important difference is that processes don't share memory; a queue will help with the reduce step.

Sign up to request clarification or add additional context in comments.

2 Comments

Note that the OP says that s/he needs to "add all of them at the end to one dictionary object." That might make multiprocessing problematic.
The dictionary doesn't have to be shared; there would be a lot of contention if it was.
1

Here's some good parts of the python library reference to start with.

http://docs.python.org/py3k/library/threading.html

http://docs.python.org/py3k/library/_thread.html

As for how to use threads effectively, I recommend you google 'python thread tutorial' or something like that.

Comments

0

How do you think that using threads would help with this?

Although Python supports threading, the standard implementation (CPython) executes only one thread at a time. Hence it's hard to see how this would make the process run faster, even on multiple cores.

(If you're using JPython or IronPython, though, this restriction doesn't apply.)

3 Comments

well .. I have 8 cores on my machine
Yes, and assuming that you're using CPython, only one of these cores would be running a Python thread at any moment. How does this help?
It's not clear what the code is supposed to do, so it's hard to say if it'll work on 1 thread or 8. But it doesn't look to me as if this Python code can be sped up by using threads.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.