Given a collection of predefined strings of unequal length, input a string, and split the string into occurrences of elements in the collection, the output should be unique for every input, and it should prefer the longest possible chunks.
For example, it should split s, c, h into different chunks, unless they are adjacent.
If "sc" appear together, it should be grouped into 'sc' and not as 's', 'c', similarly if "sh" appears then it must be grouped into 'sh', if "ch" appears then it should be grouped into 'ch', and finally "sch" should be grouped into 'sch'.
I only know string.split(delim) splits on specified delimiter, and re.split('\w{n}', string) splits string into chunks of equal lengths, both these methods don't give the intended result, how can this be done?
Pseudo code:
def phonemic_splitter(string):
phonemes = ['a', 'sh', 's', 'g', 'n', 'c', 'e', 'ch', 'sch']
output = do_something(string)
return output
And example outputs:
phonemic_splitter('case') -> ['c', 'a', 's', 'e']
phonemic_splitter('ash') -> ['a', 'sh']
phonemic_splitter('change') -> ['ch', 'a', 'n', 'g', 'e']
phonemic_splitter('schane') -> ['sch', 'a', 'n', 'e']