2

I have written a simple string search algorithm and I'd want to improve, where should I start?

The reason why I want to change the simple algorithm is:

  1. pool seems to be a big array
  2. string in toSearch seems to be long string

Here is the code:

var pool []string = []string{"st", "se", "go", "es", "per", "123", "abcd", "e", "2"}

func search(aStr string) (index int) {
    for i, s := range pool {
        if strings.Contains(aStr, s) {
            return i
        }
    }

    return -1
}

func main() {
    toSearch := []string{"string", "search", "algorithm", "best", "performance"}
    for _, s := range toSearch {
        idx := search(s)
        fmt.Printf("search %s in %d(%s)\n", s, idx, pool[idx])
    }
}
8
  • You search for the first match? Is i the index of the string in the pool? Commented Mar 10, 2016 at 7:45
  • @userunknown yes, maybe I should change the question with you advice Commented Mar 10, 2016 at 8:15
  • You should add a tag for the programming lanugage Commented Mar 10, 2016 at 8:41
  • @cricket_007 I think this is just a algorithm problem, if you can give me answer in python ruby perl and so on, it is also ok. Commented Mar 10, 2016 at 8:50
  • Could you add more description to your question of what the requirements for the algorithm are, then? You want to find the strings from toSearch in pool? Commented Mar 10, 2016 at 8:52

1 Answer 1

2

You can join all the strings in the toSearch array using a special character let's say $, then your search string would become "string$search$algorithm$best$performance", also you may need to assign an array, that would keep a record of which string you are currently at if you find a match.

I am afraid for the pool array you will have to search the above created string one by one.

One of the ways to reduce the complexity would be to use linear time pattern matching algorithms for each element of the pool array instead of the quadratic time pattern matching algorithm that you have used.

I have posted 3 linear time algorithms for searching a pattern in a given string, 2 of them are deterministic and the last is non-deterministic.

A deterministic and one of the more standard solution would be to use the Knuth Morris Pratt algorithm although it is a little complex to understand you can easily find code to implement it online. It is linear time, where m is the length of input string and n is the length of pattern.

Another deterministic algorithm and one of my favourites is the Z algorithm it is easier to understand and implement and is also linear time, it constructs what is called a Z array i.e used to easily calculate the pattern matches in the string. you can check out this link on Z Algorithm

If you want to use a non-deterministic algorithm, you can use Rabin Karp algorithm ,it requires hashing (specifically concepts of Rolling Hash) and it is the easiest to implement and is linear time. It can also can be made deterministic if you check that the string you get using the hash is correct or not due to collisions, but in worst case if you make the rabin karp algo deterministic it gives a quadratic complexity.

I have written a c++ code using the Z algorithm below :

#include<iostream>
#include<string>

using namespace std;

string s, temp;
int n, Z[1000];
int toSearchSize, poolSize, lookUpIndex[1000];
string toSearch[5] = {"string", "search", "algorithm", "best", "performance"};
string pool[9] = {"st", "se", "go", "es", "per", "123", "abcd", "e", "2"};

void joinString(){
    int idx = 0;
    for(int i = 0;i < toSearchSize;i++){
        s += toSearch[i];
        for(int j = idx;j <= idx+toSearch[i].size();j++) lookUpIndex[j] = i;
        s += "$";
        idx += toSearch[i].size()+1;
    }
    temp = s;
}

void zval(){
    int L = 0, R = 0;
    for(int i = 1;i<n;i++){
        if(i > R){
            L = R = i;
            while(R < n && s[R-L] == s[R]) R++;
            Z[i] = R-L;R--;
        }else{
            int b = R-i+1;
            if(Z[i-L] < b) Z[i] = Z[i-L];
            //else if(Z[i-L] > b) Z[i] = b;
            else{
                L = i;
                while(R < n && s[R-L] == s[R]) R++;
                Z[i] = R-L;R--;
            }
        }
    }
}

int main(){
    toSearchSize = 5, poolSize = 9;
    joinString();

    for(int i = 0;i < poolSize;i++){
        for(int j = 0;j < n;j++) Z[j] = 0;
        s = pool[i] + temp;
        n = s.size();
        zval();

        for(int j = pool[i].size();j < n;j++){
            if(Z[j] >= pool[i].size()){
                cout << "Match Found for : " << pool[i] << " in string : " << toSearch[lookUpIndex[j]] << endl;
            }
        }
    }

    return 0;
}

Output for above code :

Match Found for : st in string : string
Match Found for : st in string : best
Match Found for : se in string : search
Match Found for : go in string : algorithm
Match Found for : es in string : best
Match Found for : per in string : performance
Match Found for : e in string : search
Match Found for : e in string : best
Match Found for : e in string : performance
Match Found for : e in string : performance

Link to solution on Ideone : http://ideone.com/UGJR3i

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.