Is there a way to pipeline numpy array from disk, that are saved this way
np.save('data.npy',np.zeros(shape=[500,300,3])) # RGB image
and are read row by row (or column by column) in a similar way like generators work, but without the loading latency?
Detailed description
My application needs near to zero latency, but loading bigger arrays from disk can take some time (~0.02-0.1s). Even this small latency generates unpleasant results.
I have solution for this that satisfies the speed:
dictionary = {'array1': array1, ....}
with this I can immediately access the arrays, but since I am using raspberry pi Zero, my python program is limited with CPU and RAM, so if I have a lot of arrays, I would be dealing with
MemoryError
My application reads the array row by row with frequency 50hz, like this
for row in array:
[operation with row]
time.sleep(0.02) # in reality, whole cycle is 0.02s ( including operation time)
I am looking for kind of generator:
def generate_rows(path):
array = np.load(path)
for row in array:
yield row
This solves the problem with memory, but I guess I will lose the near zero latency (loading the array).
Therefore my question is: Is there a way to generate rows like with generator, but the first rows are ready so to say 'immediately', with near zero latency?
EDIT: Based on @Lukas Koestler and @hpaulj comments I tried memmap, but the result is suprisingly not good, because memmap crashes on Memory sooner than simply loading full arrays.
WINDOWS 10
I saved 1000 numpy arrays (shape = [500,30,3]) on the disk and tried to cached them with np.load and np.load with memmap read
import numpy as np
import os
mats = os.listdir('matrixes')
cache = []
for i in range(10):
for n in mats:
cache.append(np.load('matrixes\\{}'.format(n),mmap_mode='r')) # Load with memmap
#cache.append(np.load('matrixes\\{}'.format(n))) #load without memmap
print('{} objects stored in cache '.format((i+1)*1000))
After running both variants(with memmap and without it), these two errors occured
Memmap after storing 4000 memmaps objects:
...
File "C:\Python27\lib\site-packages\numpy\core\memmap.py", line 264, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
WindowsError: [Error 8] Not enough memory resources are available to process this command
Simple np.load without memmap after caching 5000 np.arrays
....
File "C:\Python27\lib\site-packages\numpy\lib\format.py", line 661, in read_array
array = numpy.fromfile(fp, dtype=dtype, count=count)
MemoryError
Raspberry pi Zero
As was pointed out by @Alex Yu, I was testing on windows 10, switching to raspberry pi Zero,
I got above 1000 numpy arrays (took quite long) and then I got
1000 objects stored in cache
Killed
With Memmaps, i got quite quickly above 1000 memmaps, but I got different errors
File "/usr/lib/python2.7/dist-packages/numpy/lib/npyio.py", line 416, in load
return format.open_memmap(file, mode=mmap_mode)
File "/usr/lib/python2.7/dist-packages/numpy/lib/format.py", line 792, in open_memmap
mode=mode, offset=offset)
File "/usr/lib/python2.7/dist-packages/numpy/core/memmap.py", line 264, in __new__
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap.error: [Errno 24] Too many open files
If I am not wrong, this error happens when opening a lot of files, but not closing them.
np.loadhas a memory mapped mode