1

I am playing around with the KMP algorithm in f sharp. While it works for patterns like "ATAT" (result will be [|0; 0; 1; 2;|]) , the first while loop enters a deadlock when the first 2 characters of a string are the same and the 3rd is another, for example "AAT".

I understand why: first, i gets incremented to 1. now the first condition for the while loop is true, while the second is also true, because "A" <> "T". Now it sets i to prefixtable.[!i], which is 1 again, and here we go.

Can you guys give me a hint on how to solve this?

let kMPrefix (pattern : string) = 

    let (m : int) = pattern.Length - 1
    let prefixTable = Array.create pattern.Length 0
    // i : longest proper prefix that is also a suffix 
    let i = ref 0

    // j: the index of the pattern for which the prefix value will be calculated
    // starts with 1 because the first prefix value is always 0

    for j in 1 .. m do

        while !i > 0 && pattern.[!i] <> pattern.[j] do

            i := prefixTable.[!i]

        if pattern.[!i] = pattern.[j] then

            i := !i+1           


        Array.set prefixTable j !i  

    prefixTable
1
  • 2
    I haven't used the Knuth-Morris-Pratt algorithm before, but en.wikipedia.org/wiki/… suggests that the value in the first index should be -1 rather than 0, because a mismatch at the very start of the string is a special case. Would that solve your problem? Commented Aug 19, 2016 at 9:46

1 Answer 1

3

I'm not sure how to repair the code with a small modification, since it doesn't match the KMP algorithm's lookup table contents (at least the ones I've found on Wikipedia), which are:

  • -1 for index 0
  • Otherwise, the count of consecutive elements before the current position that match the beginning (excluding the beginning itself)

Therefore, I'd expect output for "ATAT" to be [|-1; 0; 0; 1|], not [|0; 0; 1; 2;|].

This type of problem might be better to reason about in functional style. To create the KMP table, you could use a recursive function that fills the table one by one, keeping track of how many recent characters match the beginning, and start running it at the second character's index.

A possible implementation:

let buildKmpPrefixTable (pattern : string) =
    let prefixTable = Array.zeroCreate pattern.Length

    let rec run startIndex matchCount =
        let writeIndex = startIndex + matchCount
        if writeIndex < pattern.Length then
            if pattern.[writeIndex] = pattern.[matchCount] then
                prefixTable.[writeIndex] <- matchCount
                run startIndex (matchCount + 1)
            else
                prefixTable.[writeIndex] <- matchCount
                run (writeIndex + 1) 0
    run 1 0
    if pattern.Length > 0 then prefixTable.[0] <- -1
    prefixTable

This approach isn't in danger of any endless loops/recursion, because all code paths of run either increase writeIndex in the next iteration or finish iterating.

Note on terminology: the error you are describing in the question is an endless loop or, more generally, non-terminating iteration. Deadlock refers specifically to a situation in which a thread waits for a lock that will never be released because the thread holding it is itself waiting for a lock that will never be released for the same reason.

Sign up to request clarification or add additional context in comments.

1 Comment

Your code works like a charm, thank you! I used this video link, where they used an array that starts with the index 1 instead of 0, and i tried to change that in my code, which probably led to some mistakes. Also, TIL what a deadlock is. Thank you for your post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.