18

Input:

1) A huge sorted array of string SA;

2) A prefix string P;

Output:

The index of the first string matching the input prefix if any. If there is no such match, then output will be -1.

Example:

SA = {"ab", "abd", "abdf", "abz"}
P = "abd"

The output should be 1 (index starting from 0).

What's the most algorithm way to do this kind of job?

0

8 Answers 8

21

If you only want to do this once, use binary search, if on the other hand you need to do it for many different prefixes but on the same string array, building a radix tree can be a good idea, after you've built the tree each look up will be very fast.

Sign up to request clarification or add additional context in comments.

6 Comments

Best case lookup time for a radix tree is O(n) where n is the number of characters, while binary search takes O(log m) where m is the size of the list. It's not necessarially true that radix search will be faster.
Arachnid: This is not true. Even in the best case (where there is a match), binary search is O(n + log m), since it must read the entire prefix to check if it matches. In the worst case it is O(nlogm).
@Arachnid: +1 for Brian's comment, plus O(n) is WORST case lookup for radix tree, where there is a match and you need to read the whole prefix.
Well, a binary search is almost trivial to implement, and takes no additional memory. Radix trees are complicated to implement, take extra memory, and the benefit isn't huge
_.sortedIndex makes it even more trivial :)
|
3

This is just a modified bisection search:

  • Only check as many characters in each element as are in the search string; and
  • If you find a match, keep searching backwards (either linearly or by further bisection searches) until you find a non-matching result and then return the index of the last matching result.

Comments

3

It can be done in linear time using a Suffix Tree. Building the suffix tree takes linear time.

2 Comments

He could compare the prefix to each element in linear time too.
True, but if he's going check more than one prefix, that's not a good idea.
2

The FreeBSD kernel use a Radix tree for its routing table, you should check that.

Comments

1

Here is a possible solution (in Python), which has O(k.log(n)) time complexity and O(1) additional space complexity (considering n strings and k prefix length).

The rationale behind it to perform a binary search which only considers a given character index of the strings. If these are present, continue to the next character index. If any of the prefix characters cannot be found in any string, it returns immediately.

from typing import List

def first(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
    result = -1

    while left <= right:
        mid = left + ((right - left) // 2)
        if ( i >= len(items[mid]) ):
            left = mid + 1
        elif (c < items[mid][i]):
            right = mid - 1
        elif (c > items[mid][i]):
            left = mid + 1
        else:
            result = mid
            right = mid - 1

    return result

def last(items: List[str], prefix: str, i: int, c: str, left: int, right: int):
    result = -1

    while left <= right:
        mid = left + ((right - left) // 2)
        if ( i >= len(items[mid]) ):
            left = mid + 1
        elif (c < items[mid][i]):
            right = mid - 1
        elif (c > items[mid][i]):
            left = mid + 1
        else:
            result = mid
            left = mid + 1

    return result

def is_prefix(items: List[str], prefix: str):
    left = 0
    right = len(items) - 1
    for i in range(len(prefix)):
        c = prefix[i]
        left = first(items, prefix, i, c, left, right)
        right = last(items, prefix, i, c, left, right)

        if (left == -1 or right == -1):
            return False

    return True

# Test cases
a = ['ab', 'abjsiohjd', 'abikshdiu', 'ashdi','abcde Aasioudhf', 'abcdefgOAJ', 'aa', 'aaap', 'aas', 'asd', 'bbbbb', 'bsadiojh', 'iod', '0asdn', 'asdjd', 'bqw', 'ba']
a.sort()
print(a)
print(is_prefix(a, 'abcdf'))
print(is_prefix(a, 'abcde'))
print(is_prefix(a, 'abcdef'))
print(is_prefix(a, 'abcdefg'))
print(is_prefix(a, 'abcdefgh'))
print(is_prefix(a, 'abcde Aa'))
print(is_prefix(a, 'iod'))
print(is_prefix(a, 'ZZZZZZiod'))

This gist is available at https://gist.github.com/lopespm/9790d60492aff25ea0960fe9ed389c0f

1 Comment

Interesting solution. Your first() and last() are almost identical, so you can have one function and do thr return after: " else: result = mid" You may also consider how to find the index of the fiirst term in the sorted list by doing a backward loop from the result in here, because I am not sure you necessarily will land on the first element in the list with with that prefix.
0

My current solution in mind is, instead of to find the "prefix", try to find a "virtual prefix".

For example, prefix is “abd", try to find a virtual-prefix “abc(255)". (255) just represents the max char number. After locating the "abc(255)". The next word should be the first word matching "abd" if any.

Comments

0

Are you in the position to precalculate all possible prefixes?

If so, you can do that, then use a binary search to find the prefix in the precalculated table. Store the subscript to the desired value with the prefix.

Comments

0

My solution: Used binary search.

private static int search(String[] words, String searchPrefix) {

        if (words == null || words.length == 0) {
            return -1;
        }
        int low = 0;
        int high = words.length - 1;
        int searchPrefixLength = searchPrefix.length();

        while (low <= high) {
            int mid = low + (high - low) / 2;

            String word = words[mid];
            int compare = -1;

            if (searchPrefixLength <= word.length()) {
                compare = word.substring(0, searchPrefixLength).compareTo(searchPrefix);
            }

            if (compare == 0) {
                return mid;
            } else if (compare > 0) {
                high = mid - 1;
            } else {
                low = mid + 1;
            }

        }
        return -1;
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.