I am try to design a regex the will parse user input, in the form of full sentences. I am stuggling to get my expression to fully work. I know it is not well coded but I am trying hard to learn. I am currently trying to get it to parse precent as one string see under the code.
My test "sentence" = How I'm 15.5% wholesome-looking U.S.A. we RADAR () [] {} you -- are, ... you?
text = input("please type somewhat coherently: ")
pattern = r'''(?x) # set flag to allow verbose regexps
(?:[A-Z]\.)+ # abbreviations, e.g. U.S.A.
|\w+(?:[-']\w+)* # permit word-internal hyphens and apostrophes
|[-.(]+ # double hyphen, ellipsis, and open parenthesis
|\S\w* # any sequence of word characters
# |[\d+(\.\d+)?%] # percentages, 82%
|[][\{\}.,;"'?():-_`] # these are separate tokens
'''
parsed = re.findall(pattern, text)
print(parsed)
My output = ['How', "I'm", '15', '.', '5', '%', 'wholesome-looking', 'U.S.A.', 'we', 'RADAR', '(', ')', '[', ']', '{', '}', 'you', '--', 'are', ',', '...', 'you', '?']
I am looking to have the '15', '.', '5', '%' parsed as '15.5%'. The line that is currently commented out is what should do it, but when commented in does absolutly nothing. I searched for resources to help but they have not.
Thank you for you time.
[...]?\d.\d%placeholder in a character class (without repetition). Furthermore it would likely only take effect if it preceds the word+hyphens rule.