3

I wonder what is the best way to group an array of strings according to a list of prefixes (of arbitrary length). For example, if we have this:

prefixes = ['GENERAL', 'COMMON', 'HY-PHE-NATED', 'UNDERSCORED_']

Then

tasks = ['COMMONA', 'COMMONB', 'GENERALA', 'HY-PHE-NATEDA', 'UNDERESCORED_A', 'HY-PHE-NATEDB']

Should be grouped this way:

[['GENERALA'], ['COMMONA', 'COMMONB'], ['HY-PHE-NATEDA', 'HY-PHE-NATEDB'], ['UNDERESCORED_A'] ]

The naïve approach is to loop through all the tasks and inner loop through prefixes (or vice versa, whatever) and test each task for each prefix.

Can one give me a hint how to make this in a more efficient way?

2
  • Maybe a LINQ 'group by' is what you're after: richardbushnell.net/2008/02/08/… Commented May 13, 2014 at 17:04
  • @lcryder LINQ is a definitely a thing you should use in such cases if your are coding in a language where such thing like LINQ exists. But this is not an algorithm after all ))) Commented May 13, 2014 at 17:05

2 Answers 2

2

There are a few options, but you might be interested in looking into the trie data structure. http://en.wikipedia.org/wiki/Trie

The trie data structure is easy to understand and implement and works well for this type of problem. If you find that this works for your situation you can also look at Patricia Tries which achieve the similar performance characteristics but typically have better memory utilization. They are a little more involved to implement but not overly complex.

Sign up to request clarification or add additional context in comments.

2 Comments

To complete the answer: You could build a trie for the prefixes and then for each word traverse the trie. If you find a complete word on the way, that's your prefix. You could put a hash set for the words at the leaf node of the trie.
@NicoSchertler, oh, OK, I give it a trie )))
1

It depends a bit on the size of your problem, of course, but your naive approach should be okay if you sort both your prefixes and your tasks and then build your sub-arrays by traversing both sorted lists only forwards.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.