0

a string 'hello', how to list all words and the count of each word. normal suffix tree algorithm return the suffix only, mean that middle word 'll' will not appear. can anyone help me to solve in step by step?

1
  • @Valmond: I agree. Suffix and prefix trees are probably the most powerful, fastest and most useful data structure you can use for string processing. If you know how to use them correctly you can achieve many tasks more elegant and and faster than others. Have a look at the number of times Hash-Table based solutions are mentioned for string processing here on SO. Most often a similar trie based solution will beat these hash based ones in time and space. Commented Sep 20, 2011 at 9:54

2 Answers 2

1

Initialize a hash table.
Use a double loop (for within for). One loop index will represent the beginning of the substring, the other the end. Make sure the end index is strictly bigger than the beginning index, and that both are in the string boundaries.
For each substring encountered, check if it is in the hash. If it isn't - add it as a key, with the value 1. If it is - increment the current saved value.

Edit: as per your comment:

for(b = 0; b < len; ++b) {
  for(e = b + 1; e <= len; ++e) {
    //process substring from index b (incl.) to index e (excl.)
  }
}

This will traverse the string "abcd" in this order:
a ab abc abcd b bc bcd c cd d

Sign up to request clarification or add additional context in comments.

3 Comments

may you describe more clearly for me step by step? i'm unclear with the double loop you mention.thanks.
@rock: Yes, which is just O(n^2). And If you found my answer helpful, you could say so by upvoting and/or accepting it :)
@rock: The hash will basically be a map from substrings (any substring you encounter), to the number of times they were encountered. so h->1, e->1, l->2, ... hello->1
0

Use a prefix tree instead of a suffix tree. Then run through this tree and at any node output the string encountered so far plus the number of sub trees that are still available.

EDIT:

Actually it's too early and I messed up some nomenclature:

A Prefix tree is a tree that stores common prefixes only once. A Suffix tree stores all suffixes in a prefix tree. So I meant a suffix tree here (which also happens to be a special kind of prefix tree).

So you do the following:

  1. Build up your suffix tree

Do a search on the prefix tree, starting at root

function search( node ) {
   c = node.symbol;
   if not children.empty then
     for each child in node.children do
        sub_search = search( child )
        other_results.append( sub_search.results );
        sub_trees += sub_search.num_trees
     done
     for each result in other_results do
        append c to result
     done
     return c :: other_results
   else
     return {results = c; num_trees = 1 }
   fi
}

If I did not do any mistake this should do the trick. The suffix part of the suffix tree takes care of eliminating all suffixes and the prefix part takes care of eliminating all prefixes. Because both are stored reduced you get strings in between (which might already have been stored together). Note that this is not including any compression on the trie, which is usually not needed unless your strings get very long.

3 Comments

may you write down step-by-step for the prefix tree? as suffix tree can give o(n) running time.please advise..
the complexity to run the suffix tree is o(n), then proceed to searching in prefix tree, from your step in searching prefix tree, which require o(n*n)? because 2 for loop. so totally running time is o(n+n2). is my interpretation correct?
@rock: Don't jump to conclusions. You are GUESSING a complexity of O(n*n) from the existence of two for loops (hint: These loops don't run over all the data, so the complexity of each loop is not O(n))? Then you estimate the total complexity as O(n+n2). You need to pay more attention to the way cost analysis should be done in this case (hint: Look at the structure of the trie and which data get's stored where and how often).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.