You can remove the undesired strings by placing the results of re.split
in a list comprehension
with a conditional:
import re
lst = [s for s in re.split(r'[.!,;\s]\s*', 'To be or not! to be; that, is the question!') if s != '']
print(lst)
Output:
['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']
Update 1
Q: re.findall(r'\w+',s) works in this case, but it will fail to keep together a hyphenated word, like weather-beaten. Is there a modification that can be made to not split a word if there are letters (no spaces) on both sides of a hyphen?
A: Use either (a) a combination of re.split and re.search, or, even better, (b) re.findall:
lst = [s for s in
re.split(r"[.!,;\s]\s*", "To be - or not! to be; that, is the ready-made question!")
if re.search(r"\w", s)]
print(lst)
lst = re.findall(r"\b(\w+-\w+|\w+)\b", "To be - or not! to be; that, is the ready-made question!")
print(lst)
In both cases, the output is:
['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'ready-made', 'question']
re.split(...)[:-1]?re.findall(r'\w+',s)wheresis your string.