7

This question is merely about algorithm. In pseudo code is like this:

A = Array of strings; //let's say count(A)  = N
S = String to find;   //let's say length(S) = M

for (Index=0; Index<count(A); Index++)
  if (A[Index]==S) {
    print "First occurrence at index\x20"+Index;
    break;
  }

This for loop requires string comparison N times (or byte comparison N*M times, O(N*M)). This is bad when array A has lots of items, or when string S is too long.

Any better method to find out the first occurrence? Some algorithm at O(K*logK) is OK, but preferable at O(K) or best at O(logK), where K is either N or M.

I don't mind adding in some other structures or doing some data processing before the comparison loop.

9
  • 1
    "When string S is too long" is irrelevant, unless there are lots of strings in A with the same length and an identical long prefix. (String equality checks can terminate immediately if the lengths are different, or a soon as a mismatch is found in walking over them.) Commented Apr 28, 2012 at 18:43
  • 4
    Why do you use \x20 instead of a space ? I'm curious :-) Commented Apr 28, 2012 at 18:46
  • 1
    @LucM just a personal style :) Commented Apr 28, 2012 at 18:48
  • 2
    How do you define fastest? Worst case? Average case? Average performance for multiple searches? There are several interesting string searching algorithms, but none of them is fastest. Commented Apr 28, 2012 at 18:50
  • 1
    Does this include processing of strings? Commented Apr 28, 2012 at 18:58

5 Answers 5

5

You could convert the whole array of strings to a finite state machine, where the transitions are the characters of the strings and put the smallest index of the strings that produced a state into the state. This takes a lot of time, and may be considered indexing.

Sign up to request clarification or add additional context in comments.

1 Comment

[f]lex can help you to construct this DFA.
4

Put the strings into a hash based set, and test to see if a given string is contained in the set should give you more or less constant performance once the set is built.

3 Comments

If you want to find the index, use a hash-based dictionary of strings -> first occurrence.
but im a bit afraid that some 2 items may have the same hash value
Well, you stil need to do the final compare, given equal hash values.
4

You can first sort the array of strings, which will take O(m*nlogn) time. And after A is sorted, you can do a binary search instead of linear search, which could reduce the total running time to O(m*logn).

The advantage of this method is that it's quite easy to implement. For example, in Java you can do this with just 2 lines of codes:

Arrays.sort(A);
int index = Arrays.binarySearch(A, "S");

3 Comments

the sorting process before binary search takes a big portion of time, doesn't it
@PaulDinh It takes O(M N log N) time.
@PaulDinh I think in practice the time is OK. It dose take O(M N log N) time in worst case. But loading all the string will need MN time, so it's only log n times longer than IO. In most situation log n is really small, maybe even faster than build a trie or hashtable in practise. If you cares about theoretical time complexity, then build a trie or hashtable will cost O(MN) time.
3

You could use a Self-balancing binary search tree. Most implementations have O(log(n)) to insert, and O(log(n)) to search.

If your set is not very big, and you have a good hash functions for your values, the hash based set is a better solution, because in that case you will have O(1) to insert and O(1) to search. But if your hash function is bad or your set is too big, it will be O(n) to insert and O(n) to search.

Comments

1

The best way to search as fast as possible, is to have the array sorted As you describe, there seems to be no possible information a priori which would allow for some heuristics or constraints in the search

Sort the array first (Quicksort for example, O(NlogN)), and do binary search next O(log(N))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.