I have a predefined character->type dictionary. For example, 'a' - is a lower_case letter, 1 is a digit, ')' is a punctuation symbol etc. With the following script, I label all characters in a given string:
labels=''
for ch in list(example):
try:
l = character_type_dict[ch]
print(l)
labels = labels+l
except KeyError:
labels = labels+'o'
print('o')
labels
For example, given "1,234.45kg (in metric system)" as input, the code produces dpdddpddwllwpllwllllllwllllllp as output.
Now, I would like to split the string based on the groups. the output should look something like this:
['1',',','234','.','45','kg',' ','(','in',' ','metric',' ','system',')']
That is, it should split based on the character-type borders. Any ideas how this might be done efficiently?
labelsis wrong. It treatskaswandgasl