I installed the module regex (not re!) for Python 3.4.3 solely to be able to use POSIX classes such as [:graph:]. However, these don't seem to work.
import regex
sentence = "I like math, I divided ÷ the power ³ by ¾"
sentence = regex.sub("[^[:graph:]\s]","",sentence)
print(sentence)
Output: I like math, I divided ÷ the power ³ by ¾
Expected output: I like math, I divided the power by
It does work in PCRE though. So what am I missing here?
sentence = regex.sub(r"(?V1)[^[:graph:]\s]","",sentence).[:graph:]is supposed to match any visible character, but PCRE is only counting ASCII characters. Theregexlibrary treats the POSIX character classes as fully Unicode-aware, except a few that seem to be limited to the original POSIX definitions. (Search for "POSIX character classes" at the link you provided.)