9

Is there any fast way in JavaScript to find out if 2 Strings contain the same substring? e.g. I have these 2 Strings: "audi is a car" and "audiA8".

As you see the word "audi" is in both strings but we cannot find it out with a simple indexOf or RegExp, because of other characters in both strings.

6
  • 1
    if (string1 === string2) { /*identical*/ } - What are you really trying to ask, how to test whether a particular substring is in two different strings, or whether there exists some substring that appears in two different strings, or...? Could you please show an example input and the desired output? Commented Oct 22, 2012 at 7:14
  • why do you need indexOf or RegExp? just compare the 2 with '==='. Commented Oct 22, 2012 at 7:16
  • If the two strings are abc and cde, should they be considered "identical" because of c? Commented Oct 22, 2012 at 7:16
  • if you know what the substring is, you can perform indexOf on both the strings to check if the substring exists. Commented Oct 22, 2012 at 7:18
  • @nicmon, I've edited the question to remove a mention of "identical", as that's misleading. If I changed the meaning too much, please edit it to clarify. Commented Oct 22, 2012 at 7:19

5 Answers 5

10

The standard tool for doing this sort of thing in Bioinformatics is the BLAST program. It is used to compare two fragments of molecules (like DNA or proteins) to find where they align with each other - basically where the two strings (sometimes multi GB in size) share common substrings.

The basic algorithm is simple, just systematically break up one of the strings into pieces and compare the pieces with the other string. A simple implementation would be something like:

// Note: not fully tested, there may be bugs:

function subCompare (needle, haystack, min_substring_length) {

    // Min substring length is optional, if not given or is 0 default to 1:
    min_substring_length = min_substring_length || 1;

    // Search possible substrings from largest to smallest:
    for (var i=needle.length; i>=min_substring_length; i--) {
        for (j=0; j <= (needle.length - i); j++) {
            var substring = needle.substr(j,i);
            var k = haystack.indexOf(substring);
            if (k != -1) {
                return {
                    found : 1,
                    substring : substring,
                    needleIndex : j,
                    haystackIndex : k
                }
            }
        }
    }
    return {
        found : 0
    }
}

You can modify this algorithm to do more fancy searches like ignoring case, fuzzy matching the substring, look for multiple substrings etc. This is just the basic idea.

Sign up to request clarification or add additional context in comments.

3 Comments

Very nice function. If you don't mind, I'll steal it. :)
Is this a correct implementation of the BLAST algorithm? BLAST normally uses a heuristic search, but this appears to be a simple brute-force search.
@AndersonGreen No. I said this is a very simplistic implementation of the core idea behind BLAST. If you need BLAST then you should use BLAST (eg. npmjs.com/package/blastjs or npmjs.com/package/blastutils)
2

Take a look at the similar text function implementation here. It returns the number of matching chars in both strings.

For your example it would be:

similar_text("audi is a car", "audiA8") // -> 4

which means that strings have 4-char common substring.

Comments

2

Don't know about any simpler method, but this should work:

if(a.indexOf(substring) != -1 && b.indexOf(substring) != -1) { ... }

where a and b are your strings.

1 Comment

But if you still do not know the substring, how could you know before?
0

You can use the powerful algorythm of this library: https://github.com/kpdecker/jsdiff/blob/master/src/diff/base.js

like this

const wordDiff = new Diff();
wordDiff.diff('audi is a car', 'audiA8', {});

and receive the result

[
{
    "count": 4,
    "added": false,
    "removed": false,
    "value": "audi"
},
{
    "count": 9,
    "added": false,
    "removed": true,
    "value": " is a car"
},
{
    "count": 2,
    "added": true,
    "removed": false,
    "value": "A8"
}
]

Where "added": false, "removed": false - this values are common substrings.

You can do much more with this amazing library.

Comments

-2
var a = "audi is a car";
var b = "audiA8";

var chunks = a.split(" ");
var commonsFound = 0;

for (var i = 0; i < chunks.length; i++) {
    if(b.indexOf(chunks[i]) != -1) commonsFound++;
}

alert(commonsFound + " common substrings found.");

2 Comments

the substrings may have spaces
this doesn't takes into account duplicate chars in strings. for eg. 3 'a' in first string and 2 'a' in other

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.