1

I have a list of files that need to be preprocessed using just one command before being mosaicked together. This preprocessing command uses third-party software via system call to write to a geoTIFF. I wanted to use multi-threading so that the individual files can be pre-processed at the same time and then, once all individual files are processed, the results can be mosaicked together.

I have never used multi-threading/parallel processing before, and after hours of searching on the internet, I still have no clue what the best, simplest way to go about this is.

Basically, something like this:

files_list = # list of .tif files that need to be mosaicked together but first, need to be individually pre-processed

for tif_file in files_list:
    # kick the pre-processing step out to the system, but don't wait for it to finish before moving to preprocess the next tif_file

# wait for all tiffs in files_list to finish pre-processing
# then mosaick together

How could I achieve this?

9
  • What is the output of the pre-processing? Commented Sep 22, 2016 at 15:45
  • Any reason this task should be parallelized? Doing these files one after another would be definetely much faster (except few special cases) due to overhead python has for multithreading. Commented Sep 22, 2016 at 15:49
  • @PeterWood the output of the pre-processing step are geoTIFFs that I need to mosaic together Commented Sep 22, 2016 at 15:51
  • Are geoTIFFs files or in memory? Commented Sep 22, 2016 at 15:52
  • @TomaszPlaskota Well, the purpose was to make the code faster, hah. Can you explain in more detail? How do you know that that is the case? Thx Commented Sep 22, 2016 at 15:52

2 Answers 2

0

See the multiprocessing documentation.

from multiprocessing import Pool

def main():
    pool = Pool(processes=8)
    pool.map(pre_processing_command, files_list)

    mosaic()

if __name__ == '__main__':
    main()
Sign up to request clarification or add additional context in comments.

Comments

0

if you need to use multiple processor cores you should use multiprocess, in the most simple case you can use something like:

def process_function(tif_file):
    ... your processing code here ...

for tif_file in files_list:
    p = Process(target=process_function, args=(tif_file))
    p.start()
    p.join()

You need to take care, because so many process running at same time can overpass the PC resources, you can look here and here for solutions to the problem.

You can also use threading.thread, but it uses only one processor core, and is restricted by the Global Interpreter Lock

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.