4

I have two strings which must be compared for similarity. The algorithm must be designed to find the maximal similarity. In this instance, the ordering matters, but intervening (or missing) characters do not. Edit distance cannot be used in this case for various reasons.

The situation is basically as follows:

string 1: ABCDEFG
string 2: AFENBCDGRDLFG

the resulting algorithm would find the substrings A, BCD, FG

I currently have a recursive solution, but because this must be run on massive amounts of data, any improvements would be greatly appreciated

9
  • 2
    Can you post a description of your current, recursive solution? Commented Sep 20, 2010 at 3:19
  • Do you have a language for this? Commented Sep 20, 2010 at 3:21
  • Is it just me, or is this NP-hard? Commented Sep 20, 2010 at 3:21
  • @Peter: The reason I removed the substring tag was that substrings are always consecutive. What you want is a subsequence which need not be consecutive. Commented Sep 20, 2010 at 3:40
  • 1
    Isn't "E" also a common substring here? Commented Sep 20, 2010 at 10:25

1 Answer 1

5

Looking at your sole example it looks like you want to find longest common subsequence. Take a look at LCS

Is it just me, or is this NP-hard? – David Titarenco (from comment)

If you want LCS of arbitrary number of strings its NP-hard. But it the number of input strings is constant ( as in this case, 2) this can be done in polynomial time.

Sign up to request clarification or add additional context in comments.

3 Comments

This isn't right. given OP's example input, it would return 'ABCDFG'
@Aaron: You'll have to modify the algorithm a bit to find the actual substring that contributed to the answer. But the basic idea remains very much the same.
Thanks, I'll look into modifying an LCS algo

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.