0

I have an array of n strings. I want to select all the elements of the array that has a given string.

Sorry if that is not clear. I'll give an example.

input = "as"
array = {"abas", "aras", "as", "ask", "asi", "aso", "atas", "best", "test"}
output = {"abas", "aras", "as", "ask", "asi", "aso", "atas"}

Which algorithm will I need to do this selection. I need the fastest algorithm that will perform this operation since I'm using it for autoComplete in android So the search should be faster than the typing speed of the user. I have total 20000 entries.

3
  • 1
    You want the string to start with the given input as or you want it to be a substring? Either correct your question or your output? you wrote starts with in the explanation but you wrote atas and abas in your output for the input as. Commented Jul 10, 2015 at 6:27
  • 1
    Why "abas" and "atas" is selected but "aras" is not? Commented Jul 10, 2015 at 6:28
  • yes i make a mistake. Commented Jul 10, 2015 at 15:26

5 Answers 5

1

NOTE: IF YOU WANT TO SHOW ALL STRINGS THAT START WITH YOUR INPUT, READ THIS.

As you want all the strings that start with the given input, any string matching algorithm like KMP or Boyer Moore is not going to give you good results. Because you have to iterate over all the string in the array and compare(If you want want suffix, KMP does not do any better than linear search).

A better option would be to construct a Trie with your array and when you want to show the result of autoComplete, just traverse through the array and show all words under your current Node.

for your input array = ["abas", "aras", "as", "ask", "asi", "aso", "atas" ,"best","test"] The corresponding Trie would be : ('.' represents end of the string)

I did not add test but the structure will be just like best

            DUMMY
          /       \
         a         b
       / | \       | 
      b  r  s      est.
     /   |   ? 
    as.  as.

The tree in place of ? would look like :

              s.
            / | \
           k. i.  o.

When you want to search all strings that start with as, you have to just traverse in the path as and print all words under it. Here {as,ask,asi,aso}

Sign up to request clarification or add additional context in comments.

4 Comments

sir i want to show all strings which has "as'.In middle or in any where.
Then you will have to iterate over all the strings in array and if your input is a substring (using KMP or someother)
can you give me a link of tutorial where can i study this (KMP) algorithm
Here is the java implementation. Read more about it ( just search KMP in Google) Try to understand it. algs4.cs.princeton.edu/53substring/KMP.java.html
1

Boyer Moore - Horspool algorithm is a fast way for string search. It is a good way to finding substrings in mega texts

Comments

1

For individual Strings, you could check for substring using KMP Algorithm.

You could also skip looping though some of the strings in your array, by using a string-letters map and doing a lookup on this map based on the input string letters. But weather it would be more optimal than looking up on everything or not would depend on the dataset.

Comments

1

First sort your array and mantain is sorted.

Then search from the middle and go on.

String[] a = ....;
Arrays.sort(a); // Only first time

String str = .... // String to find
String[] output = find(str, a, 0, a.length - 1);

Please note that the following function has not been tested, so take it a prototype to code your correct function.

public int find(String str, String[] a, int start, int end) {
    if (start >= end) {
        return start;
    }
    int middle = end - start / 2;
    if (a[middle].startsWith(str)) {    // Search the middle point 
        return find(str, a, middle + 1, end);
    } else {
        return find(str, a, start, middle - 1);
    }
} 

This code is executed in log(n) for each research where n is the number of elements in the searcheable array (a in this case).

Add a one time O(n log(n)) for the initial sorting. Not needed if the array a is already sorted.

Comments

1

StringSearch

High-performance pattern matching algorithms in Java

The Java language lacks fast string searching algorithms. StringSearch provides implementations of the Boyer-Moore and the Shift-Or (bit-parallel) algorithms. These algorithms are easily five to ten times faster than the naïve implementation found in java.lang.String

http://johannburkard.de/software/stringsearch/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.