I have a nested loop below which, given a long string S of length n and a query Q of length m, checks (naively) whether Q is a substring of S with max_error number of mismatches. Since each iteration of the outer loop is independent, it makes sense to parallelize it for speedup (right?).
For example, if S=ATGGTC, and Q=TCCT, and max_err=2, then Q is a match since substring TGGT in S has <=2 mismatches with Q. But, if max_err=1, then Q is not a match in S because there is no substring of length 4 in S that matches with P with <=1 mismatches.
def is_match(S,n,Q,m,max_err): # lengths; assume n>m
for i in range(0,n-m):
cnt = 0
for k in range(m):
if S[i+k]!=Q[k]: cnt += 1
if cnt>max_err: break
if cnt<=max_err: return i
return -1
To parallelize it, I can make the inner loop a function like below. (I moved the return -1 inside this function)
def inner_loop(S,Q,m,i):
cnt = 0
for k in range(m):
if S[i+k]!=Q[k]: cnt += 1
if cnt>max_err: break
if cnt<=max_err: return i
return -1
Now, I need to run the outer loop in parallel, collect the results, and check if there is a non--1 in the results. I am stuck here. Is the following code on the right way? How to collect the results?
def is_match_parallel(S,n,Q,m,max_err):
inner_loop_1_par = inner_loop(S=S, Q=Q, m=m, i)
result = []
pool = multiprocessing.Pool()
## the following line is not probably right
result = pool.map(inner_loop_1_par, range(n-m))
for i in range(n-m):
if result[i]!=-1: return i
return -1
Also, as soon as any of the processes above returns a non--1 value, we are done, and we don't need to wait for the rest of them. Is there anyway to do this?