3

I have the following function:

public void scanText(char[] T){
    int q=0;
    for(int i=0;i<T.length;i++){
        q = transFunc[preCompRow[q]+T[i]];
        if(q==pattern.length){
            System.out.println("match found at position: "+(i-pattern.length+2));
        }
    }
}

This function scans a char Array searching for matches of a given pattern, which is stored as a finite automata. The transition function of the automata is stored in the variable called transFunc.

I am testing this function in a text with 8 millions of characters and using 800000 patterns. The thing is the accession of the array preCompRow[q] (which is an int[]) is very slow. The performance is greatly improved if I delete the preCompRow[q] of the code. I think this might be because in every loop the q variable has a different non-sequential value (2, 56, 18, 9 ..).

Is there any better way to access to an array in a non-sequential manner?

Thanks in advance!

9
  • 2
    Define "very slow". Commented Jun 11, 2017 at 1:11
  • I don't think that the java access to an array is the slow thing. ¿How is your ram? maybe you have all in ram and the OS is swapping a lot, so it goes to the disk to get your array position. Try it with a few characters, to see the performance without using a lot of ram. Commented Jun 11, 2017 at 1:37
  • 1
    @Andreas I searched 4000 different patterns over a char array with 8324701 characters and it took 143 seconds. Then I run the same test but I change transFunc[preCompRow[q]+T[i]] for transFunc[T[i]] and it took only 9 seconds. @leoxs I didn't see any swap activity during the execution of both tests. I am using 2 GB for the heap Commented Jun 11, 2017 at 2:00
  • I you really need performance, you could code that in C and use it as a module in your java project. If that can run in parallel, thinks about using threads, or better, doing that on GPU. Commented Jun 11, 2017 at 2:06
  • 1
    I also thought the bottleneck was in +, but I tested several cases transFunc[T[i]], transFunc[1+T[I]], transFunc[T[I]] and all had reasonable performance, but every time I add preCompRow[q] inside the index the whole thing goes wrong. When I fix q to some value, things goes really fast, for example transFunc[preCompRow[1]+T[I]]. My conclusion was the problem was the value of q which in every pass of the loop can take a value between 1 - # characters of the pattern Commented Jun 11, 2017 at 2:26

1 Answer 1

1

One possible explanation is that your code is seeing poor memory performance due to poor locality in its memory access patterns.

The role of the memory caches in a modern computer is to deal with the speed mismatch between processor instruction times (less than 1 ns) and main memory (5 to 10 ns or more). They work best when your code gets a cache hit most time it fetches from memory.

A modern Intel chipset caches memory in blocks of 64 bytes, and loads from main memory in burst mode. (That corresponds to 16 int values.) The L1 cache on (say) an I7 processor is 2MB.

If your application is able to access the data in a large array (roughly) sequentially, then 7 out of 8 accesses will be a cache hits. If the access pattern is non-sequential and the "working set" of is a large multiple of the cache size, then you may end up with a cache miss on each memory access.

If memory access locality is the root of yoiur problems, then your option are limited:

  • redesign your algorithm so that locality of memory references is better
  • buy hardware with larger caches
  • (maybe) redesign your algorithm to use GPUs or some other strategy to reduce the memory traffic

Recoding your existing in C or C++ may give a performance improvement, but the same memory locality problems will bite you there as well.

I am not aware of any tools that can be used to measure cache performance in Java applications.

Sign up to request clarification or add additional context in comments.

4 Comments

I think you're on the right track. Since T values are char, they are likely limited to ASCII, i.e. within a numeric range of 32 and 126. transFunc[T[i]] will then only access those indexes of transFunc, and that entire range (760 bytes in total) will quickly be in the L1 cache. --- For transFunc[preCompRow[q]+T[i]], we don't know range of preCompRow values, i.e. size of transFunc, but if that is large, it will take a lot longer to load it into L1 cache, and may even exceed size of L1 cache, greatly increasing cache miss ratio, even for the same index values.
Yes, I think the problem is related to cache. preComRow is just an int array of the size of the pattern. Rows are precomputed because I am using a 2D array for the transition function but expressed as a 1D array (I thought I was having problem with JAVA 2D arrays emulation), then an array of the form arr[i][j] is expressed as arr[(I*steps) + j ], I precomputed this to avoid the computation in the searching loop. The thing is I know the range of preCompRow.
I forgot to mention, the patterns I am searching for are very short, only 50 characters long
It would help us if you created a MCVE for your problem so that we can observe and then diagnose the "slow accession" behavior. Without a proper MCVE, I don't think it will be possible to do anything more than guess.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.