I am working on a python program which read a lot of images in batches (let's say 500 images) and store it in a numpy array.
Now it's single thread, and IO is very fast, the part which take a lot of time is creating numpy array and doing something on it.
By using multiprocessing module, I am able to read and create the array in other process. But I am having problem let the main thread access those data.
I have tried:
1: Using multiprocessing.queues: Very slow, I believe it's the pickle and unpickle waste a lot of time. Pickling and unpickling a large numpy array take quite some time.
2: Using Manager.list(): Faster than queues, but when try to access it in main thread, it 's still very slow. Even just iterate over the list and do nothing takes 2 seconds per item. I don't understand why it take so much time.
Any suggestions ? Thanks.