How to do asynchronous file copying in Python?

Question

I'm working on a function in Python that takes in a list of file paths and a list of destinations, and copies each file to each of the given destinations. I have the copying portion of this function working correctly, but I need to be able to run this function asynchronously apart from the operation of a gui so that less time is taken filling out each "form." I also need the copying function to inform the user each time a file has been copied to all of the directories.

I have done a little bit of research on how to do this, but each option is quite different, using different libraries for example. How would you suggest I do this?

On a specific OS or OS independent?

dawg
– dawg

2015-02-03 17:59:41 +00:00
Commented Feb 3, 2015 at 17:59 — dawg
– dawg, Commented Feb 3, 2015 at 17:59

Vor · Accepted Answer · 2015-02-03 17:56:51Z

3

Since your problem is a IO bound, I would recommend you to look at threading module. With combination of Queue module you will achieve that.

answered Feb 3, 2015 at 17:56

Vor

35.6k47 gold badges142 silver badges196 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Steve Weston · Accepted Answer · 2015-02-12 19:45:41Z

I agree that a thread that communicates via a Queue is a good solution. Here's an example class:

import os, shutil, threading, Queue

class FileCopy(threading.Thread):
    def __init__(self, queue, files, dirs):
        threading.Thread.__init__(self)
        self.queue = queue
        self.files = list(files)  # copy list
        self.dirs = list(dirs)    # copy list
        for f in files:
            if not os.path.exists(f):
                raise ValueError('%s does not exist' % f)
        for d in dirs:
            if not os.path.isdir(d):
                raise ValueError('%s is not a directory' % d)

    def run(self):
        # This puts one object into the queue for each file,
        # plus a None to indicate completion
        try:
            for f in self.files:
                try:
                    for d in self.dirs:
                        shutil.copy(f, d)
                except IOError, e:
                    self.queue.put(e)
                else:
                    self.queue.put(f)
        finally:
            self.queue.put(None)  # signal completion

Here's an example of how this class can be used:

queue = Queue.Queue()
files = ['a', 'b', 'c']
dirs = ['./x', './y', './z']
copythread = FileCopy(queue, files, dirs)
copythread.start()
while True:
    x = queue.get()
    if x is None:
        break
    print(x)
copythread.join()

kb3dow · Accepted Answer · 2018-11-08 17:22:53Z

1

The threads in the earlier answer always execute sequentially without any parallelism. With a few changes, one can have the IO bound jobs run together so that the completion of threads depends on the size of files.

I made minor changes to the code in that answer and experimented to verify that the smallest files finish first while the other IO operations are continuing in the background.

import os, shutil, threading, queue

class FileCopy(threading.Thread):
    def __init__(self, queue, files, dirs):
        threading.Thread.__init__(self)
        self.queue = queue
        self.files = list(files)  # copy list
        self.dirs = list(dirs)    # copy list
        for f in files:
            if not os.path.exists(f):
                raise ValueError('%s does not exist' % f)
        for d in dirs:
            if not os.path.isdir(d):
                raise ValueError('%s is not a directory' % d)

    def run(self):
        # This puts one object into the queue for each file
        try:
            for f in self.files:
                try:
                    for d in self.dirs:
                        shutil.copy(f, d)
                except IOError as e:
                    self.queue.put(e)
                else:
                    self.queue.put(f)
        finally:
            pass


queue = queue.Queue()
files = ['a', 'b', 'c']
dirs = ['./x', './y', './z']
thlist = []

for file in files:
    copythread = FileCopy(queue, [file], dirs)
    thlist.append(copythread)

for th in thlist:
    th.start()

for file in files:
     x = queue.get()
     print("Finished copying " + x)

for th in thlist:
    th.join()

answered Nov 8, 2018 at 17:22

kb3dow

192 bronze badges

1 Comment

Christian Over a year ago

This doesn't consider the core aspect of the question, asynchronously copy a file. I suppose you could have a future which resolves after the thread has completed, but using threads in an asynchronous system could possibly ruin the benefits of an async-engine resolving those futures. Consider the case where the async engine breaks the operation into parallel futures, and would like to wait for all of them to resolve before continuing, your thread would be ran independently & unmanaged by the engine, meaning it will be absorbing runtime the engine could be using more optimally

Collectives™ on Stack Overflow

How to do asynchronous file copying in Python?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related