0

Background

I am designing an application which will convert text from one language to other. For example, input text hello will be converted to specified language text. This is done by having a lookup table for each language. Conversion algorithm has the following steps.

  1. Looks for exact match. The whole word hello will be the pattern. If any matches found, stop processing and return the match.
  2. Else start from left-right and lookup for each character until all the combinations are done. Ex: Iteration1 : pattern = h, 2nd - he, 3rd - hel and so on. Matched token will be kept in a temporary buffer and returned when there are no more matches. This works like maximal munch rule.
  3. If match found, the matched text will be removed from original text and continue processing with the remaining text.

This algorithm will have to iterate over the input text multiple times and leads into quadratic complexity. I am trying to optimize this by arranging the lookup table in a tree structure (very similar to suffix tree). If there are lookup text for h, hel, lo, it will be organized like

root
--h
----hel
--lo

This will help me to avoid unnecessary lookups. After matching h, I need to goto next character only if h node has got children. This reduces the number of iterations drastically.

Questions

  1. What is the name of this kind of data structure? Is there a ready-made implementation available for C or C++?
  2. Do you see any possible improvements on the above algorithm or data structure?

Any thoughts...?

1

2 Answers 2

2

A ternary search tree might be what you're talking about. It is similar to a trie, but more space efficient from what I understand.

Sign up to request clarification or add additional context in comments.

Comments

1

Look at 'Trie' data structure.

Why don't use Lucene that indexes text in very fast way, and even more - indexing includes algorithms for

  • steming
  • fuse matching

and so on.

1 Comment

Thanks for the suggestion. But, Lucene is in Java and I am looking for a C or C++ implementation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.