2

I want to check if any string in an array of strings is a prefix of any other string in the same array. I'm thinking radix sort, then single pass through the array.

Anyone have a better idea?

3
  • Belongs on cs.stackexchange.com Commented Jul 2, 2013 at 23:20
  • 1
    If you build a trie (en.wikipedia.org/wiki/Trie), you need only look to see if a word has a suffix. Commented Jul 2, 2013 at 23:27
  • I like the trie idea. The way the question is phrased, you could short circuit during population of the trie if you detect a suffix. (Question doesn't say you have to find all prefixes.) Commented Jul 3, 2013 at 0:15

3 Answers 3

1

I think, radix sort can be modified to retrieve prefices on the fly. All we have to do is to sort lines by their first letter, storing their copies with no first letter in each cell. Then if the cell contains empty line, this line corresponds to a prefix. And if the cell contains only one entry, then of course there are no possible lines-prefices in it.

Here, this might be cleaner, than my english:

lines = [
"qwerty",
"qwe",
"asddsa",
"zxcvb",
"zxcvbn",
"zxcvbnm"
]

line_lines = [(line, line) for line in lines]

def find_sub(line_lines):
    cells = [ [] for i in range(26)]
    for (ine, line) in line_lines:
        if ine == "":
            print line
        else:
            index = ord(ine[0]) - ord('a')
            cells[index] += [( ine[1:], line )]
    for cell in cells:
        if len(cell) > 1:
            find_sub( cell )

find_sub(line_lines)
Sign up to request clarification or add additional context in comments.

Comments

1

If you sort them, you only need to check each string if it is a prefix of the next.

Comments

1

To achieve a time complexity close to O(N2): compute hash values for each string.

Come up with a good hash function that looks something like:

  • A mapping from [a-z]->[1,26]
  • A modulo operation(use a large prime) to prevent overflow of integer

So something like "ab" gets computed as "12"=1*27+ 2=29

A point to note:

Be careful what base you compute the hash value on.For example if you take a base less than 27 you can have two strings giving the same hash value, and we don't want that.

Steps:

  1. Compute hash value for each string
  2. Compare hash values of current string with other strings:I'll let you figure out how you would do that comparison.Once two strings match, you are still not sure if it is really a prefix(due to the modulo operation that we did) so do a extra check to see if they are prefixes.
  3. Report answer

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.