Unwanted ending empty string applying Python Regular Expression split method to test string [duplicate]

Question

In the following Python regular expression, splitting words from a string into a list of substrings, I'm trying to avoid the empty string, '', on the output. Can I adjust the inputs of the regular expression to achieve this?

In [1]: import re
In [2]: re.split(r'[.!,;\s]\s*', 'To be or not! to be; that, is the question!')
Out [2]: ['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question', '']

Resulted in unwanted empty string at end of list apply re.split method.

@KenzoStaelens This is not robust. For example, it fails for 'To be or not! to be; that, is the question' (no trailing punctuation, so the last element is also desired). — Timur Shtatland
– Timur Shtatland, Commented May 19 at 14:51
Is splitting the string what you actually looking to do here (XY-Problem) ? Or is splitting just a way to come up with finding all words in the string? If you want do findall words in the string you can use re.findall(r'\w+',s) where s is your string. — DuesserBaest
– DuesserBaest, Commented May 19 at 15:29
re.findall(r'\w+',s) works in this case, but it will fail to keep together a hyphenated word, like weather-beaten. Is there a modification that can be made to not split a word if there are letters (no spaces) on both sides of a hyphen? — Lee McNally
– Lee McNally, Commented May 19 at 19:10

Timur Shtatland · Accepted Answer · 2025-05-19 20:41:11Z

You can remove the undesired strings by placing the results of re.split in a list comprehension with a conditional:

import re
lst = [s for s in re.split(r'[.!,;\s]\s*', 'To be or not! to be; that, is the question!') if s != '']
print(lst)

Output:

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'question']

Update 1

Q: re.findall(r'\w+',s) works in this case, but it will fail to keep together a hyphenated word, like weather-beaten. Is there a modification that can be made to not split a word if there are letters (no spaces) on both sides of a hyphen?

A: Use either (a) a combination of re.split and re.search, or, even better, (b) re.findall:

lst = [s for s in
       re.split(r"[.!,;\s]\s*", "To be - or not! to be; that, is the ready-made question!")
       if re.search(r"\w", s)]
print(lst)

lst = re.findall(r"\b(\w+-\w+|\w+)\b", "To be - or not! to be; that, is the ready-made question!")
print(lst)

In both cases, the output is:

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'the', 'ready-made', 'question']

Collectives™ on Stack Overflow

Unwanted ending empty string applying Python Regular Expression split method to test string [duplicate]

1 Answer 1

Update 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Update 1

Comments

Linked

Related