I have the following function:
public void scanText(char[] T){
int q=0;
for(int i=0;i<T.length;i++){
q = transFunc[preCompRow[q]+T[i]];
if(q==pattern.length){
System.out.println("match found at position: "+(i-pattern.length+2));
}
}
}
This function scans a char Array searching for matches of a given pattern, which is stored as a finite automata. The transition function of the automata is stored in the variable called transFunc.
I am testing this function in a text with 8 millions of characters and using 800000 patterns. The thing is the accession of the array preCompRow[q] (which is an int[]) is very slow. The performance is greatly improved if I delete the preCompRow[q] of the code. I think this might be because in every loop the q variable has a different non-sequential value (2, 56, 18, 9 ..).
Is there any better way to access to an array in a non-sequential manner?
Thanks in advance!
transFunc[preCompRow[q]+T[i]]fortransFunc[T[i]]and it took only 9 seconds. @leoxs I didn't see any swap activity during the execution of both tests. I am using 2 GB for the heaptransFunc[T[i]],transFunc[1+T[I]],transFunc[T[I]]and all had reasonable performance, but every time I add preCompRow[q] inside the index the whole thing goes wrong. When I fixqto some value, things goes really fast, for exampletransFunc[preCompRow[1]+T[I]]. My conclusion was the problem was the value ofqwhich in every pass of the loop can take a value between 1 - # characters of the pattern