I'm trying to multiprocess some existing code and I'm finding that the pickling/unpickling of data to the processes is too slow with a Pool. I think for my situation a Manager will suffer the same issues since it does the same pickling behind the scenes.
To solve the issue I'm trying to move to a shared memory array. For this to work, I need an array of strings. It seems that multiprocessing.Array supports a ctypes.c_char_p but I'm having difficulty extending this into an array of strings. Below is a few of the many things I've tried.
#!/usr/bin/python
import ctypes
import multiprocessing as mp
import multiprocessing.sharedctypes as mpsc
import numpy
# Tested possible solutions
ver = 1
if 1==ver:
strings = mpsc.RawArray(ctypes.c_char_p, (' '*10, ' '*10, ' '*10, ' '*10))
elif 2==ver:
tmp_strings = [mpsc.RawValue(ctypes.c_char_p, ' '*10) for i in xrange(4)]
strings = mpsc.RawArray(ctypes.c_char_p, tmp_strings)
elif 3==ver:
strings = []
for i in xrange(4):
strings.append( mpsc.RawValue(ctypes.c_char_p, 10) )
def worker(args):
snum, lenarg = args
string = '%s' % snum
string *= lenarg
strings[snum] = string
return string
# Main progam
data = [(i, numpy.random.randint(1,10)) for i in xrange(3)]
print 'Testing version ', ver
print
print 'Single process'
for x in map(worker, data):
print '%10s : %s' % (x, list(strings))
print
print 'Multi-process'
pool = mp.Pool(3)
for x in pool.map(worker, data):
print '%10s : %s' % (x, list(strings))
print ' ', [isinstance(s, str) for s in strings]
Note that I'm using the multiprocessing.sharedctypes because I don't need locking and it should be fairly interchangeable with multiprocessing.Array
The issue with the above code is that the resultant strings object contains regular strings, not shared memory strings coming out of the mpsc.RawArray constructor. With version 1 and 2 you can see how the data gets scrambled when working out of process (as expected). For me, version 3 looked like it worked initially but you can see the = is just setting the object to a regular string and while this works for the short test, in the larger program it creates issues.
It seems like there should be a way to create a shared array of pointers where the pointers point strings in shared memory space. The c_void_p type complains if you try to initialize it with a c_str_p type and I haven't had any luck manipulating the underlying address pointers directly yet.
Any help would be appreciated.