9

I'm working on a function in Python that takes in a list of file paths and a list of destinations, and copies each file to each of the given destinations. I have the copying portion of this function working correctly, but I need to be able to run this function asynchronously apart from the operation of a gui so that less time is taken filling out each "form." I also need the copying function to inform the user each time a file has been copied to all of the directories.

I have done a little bit of research on how to do this, but each option is quite different, using different libraries for example. How would you suggest I do this?

1
  • On a specific OS or OS independent? Commented Feb 3, 2015 at 17:59

3 Answers 3

3

Since your problem is a IO bound, I would recommend you to look at threading module. With combination of Queue module you will achieve that.

Sign up to request clarification or add additional context in comments.

Comments

3

I agree that a thread that communicates via a Queue is a good solution. Here's an example class:

import os, shutil, threading, Queue

class FileCopy(threading.Thread):
    def __init__(self, queue, files, dirs):
        threading.Thread.__init__(self)
        self.queue = queue
        self.files = list(files)  # copy list
        self.dirs = list(dirs)    # copy list
        for f in files:
            if not os.path.exists(f):
                raise ValueError('%s does not exist' % f)
        for d in dirs:
            if not os.path.isdir(d):
                raise ValueError('%s is not a directory' % d)

    def run(self):
        # This puts one object into the queue for each file,
        # plus a None to indicate completion
        try:
            for f in self.files:
                try:
                    for d in self.dirs:
                        shutil.copy(f, d)
                except IOError, e:
                    self.queue.put(e)
                else:
                    self.queue.put(f)
        finally:
            self.queue.put(None)  # signal completion

Here's an example of how this class can be used:

queue = Queue.Queue()
files = ['a', 'b', 'c']
dirs = ['./x', './y', './z']
copythread = FileCopy(queue, files, dirs)
copythread.start()
while True:
    x = queue.get()
    if x is None:
        break
    print(x)
copythread.join()

Comments

1

The threads in the earlier answer always execute sequentially without any parallelism. With a few changes, one can have the IO bound jobs run together so that the completion of threads depends on the size of files.

I made minor changes to the code in that answer and experimented to verify that the smallest files finish first while the other IO operations are continuing in the background.

import os, shutil, threading, queue

class FileCopy(threading.Thread):
    def __init__(self, queue, files, dirs):
        threading.Thread.__init__(self)
        self.queue = queue
        self.files = list(files)  # copy list
        self.dirs = list(dirs)    # copy list
        for f in files:
            if not os.path.exists(f):
                raise ValueError('%s does not exist' % f)
        for d in dirs:
            if not os.path.isdir(d):
                raise ValueError('%s is not a directory' % d)

    def run(self):
        # This puts one object into the queue for each file
        try:
            for f in self.files:
                try:
                    for d in self.dirs:
                        shutil.copy(f, d)
                except IOError as e:
                    self.queue.put(e)
                else:
                    self.queue.put(f)
        finally:
            pass


queue = queue.Queue()
files = ['a', 'b', 'c']
dirs = ['./x', './y', './z']
thlist = []

for file in files:
    copythread = FileCopy(queue, [file], dirs)
    thlist.append(copythread)

for th in thlist:
    th.start()

for file in files:
     x = queue.get()
     print("Finished copying " + x)

for th in thlist:
    th.join()

1 Comment

This doesn't consider the core aspect of the question, asynchronously copy a file. I suppose you could have a future which resolves after the thread has completed, but using threads in an asynchronous system could possibly ruin the benefits of an async-engine resolving those futures. Consider the case where the async engine breaks the operation into parallel futures, and would like to wait for all of them to resolve before continuing, your thread would be ran independently & unmanaged by the engine, meaning it will be absorbing runtime the engine could be using more optimally

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.