1

I am doing a medical study and the following algorithm, so simple to humans, I found hard to implement.

The situation:

We have an array of increasing natural numbers. A reader tries to read out loud those numbers in order. They might say a wrong number or skip a number. The last number will always be correct. The reader's response will be another array. My goal is to figure out how many mistakes the reader makes given the two arrays.

For example: given [1,2,3,4,5,6]

the reader responded [1,2,3,4,4,6] One mistake (reading 4 on the 5 slot).
the reader responded [1,2,3,5,6]. One mistake (one skip)
the reader responded [1,2,3,3,6]. Two mistakes (1 skip and 1 wrong)

I have tried to use adjacency match but it fails easily when the reader says the wrong number and skips a number of times consecutively.

How would you implement such an algorithm? As suggested in the comments, Levenshtein distance with some alteration seems to solve this problem. Damerau_Levenshtein is not necessary because the reader is unlikely to swap answers.

6
  • how can we help with your existing code? maybe you can share it? Commented Jun 12, 2017 at 20:30
  • 4
    Probably a case for a string distance metric like the Levenshtein distance. Commented Jun 12, 2017 at 20:40
  • Possible duplicate of Damerau-Levenshtein php Commented Jun 12, 2017 at 20:43
  • can the reader read more elements than are available? For example, [1, 2, 3, 4, 5, 6, 6] ? Commented Jun 12, 2017 at 21:42
  • @EyuelDK No they can't. The program takes a response array only as large as the correct array. Commented Jun 13, 2017 at 14:32

1 Answer 1

1

As pointed by others , this problem can be solved in ways similar to standard 'Edit Distance' problem in DP method. Wiki reference of the problem will give more details.

The changes needed to suite your need will be to allow only 2 operations out of 3 operations (Insert, Delete, Substitute).

Operations mapped for your problem:

  • Delete -> Skip
  • Substitute -> Mistake

A python implementation below should be a starting point.

def get_mistakes(s1, s2):
    m = [[0 for j in range(len(s2)+1)] for i in range(len(s1)+1)]
    for i in range(len(s1)+1):
        for j in range(len(s2)+1):
            if i == 0:
                m[i][j] = sum(bytearray(s2[:j]))
            elif j == 0:
                m[i][j] = sum(bytearray(s1[:i]))
            elif s1[i-1] == s2[j-1]:
                m[i][j] = m[i-1][j-1]
            else:
                skip = 1
                substitute = 1
                m[i][j] = min(m[i-1][j-1] + substitute, m[i][j-1] + skip)
    return m[len(s1)][len(s2)]

Hope it Helps!

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.