5

The following example is taken from the python re documents

re.split(r'\b', 'Words, words, words.')
['', 'Words', ', ', 'words', ', ', 'words', '.']

'\b' matches the empty string at the beginning or end of a word. Which means if you run this code it produces an error.

(jupyter notebook python 3.6)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-128-f4d2d57a2022> in <module>
      1 reg = re.compile(r"\b")
----> 2 re.split(reg, "Words, word, word.")

/usr/lib/python3.6/re.py in split(pattern, string, maxsplit, flags)
    210     and the remainder of the string is returned as the final element
    211     of the list."""
--> 212     return _compile(pattern, flags).split(string, maxsplit)
    213 
    214 def findall(pattern, string, flags=0):

ValueError: split() requires a non-empty pattern match.

Since \b only matches empty strings, split() does not get its requirement "non-empty" pattern match. I have seen varying questions related to split() and empty strings. Some I could see how you may want to do it in practice, example, the question here. Answers vary from "just can't do it" to (older ones) "it's a bug".

My question is this:

  1. Since this is still an example on the python web page, should this be possible? is it something that is possible in the bleeding edge release?

  2. The question in the in the link above involved re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar'), it was asked in 2015 and there was no way to accomplish the requirements with just re.split(), is this still the case?

7
  • Do you want to split at the start of words? Sorry, but splitting with \b does not make much sense. Note that with Python 3.7, you may split with zero length matches. Commented Jan 27, 2019 at 18:15
  • Its more about splitting on empty strings. Would the other question in the link be possible with Python 3.7. I used the \b example because it was on the web page, and suggested this type of thing should be possible. Although splitting \b might not be practical, a case where you want to split a long string on an empty match seems like something that might be useful. This example fails for the same reason: re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar'). I would read this as "split on empty string followed by 'bar' and not prefixed with foo" (could be wrong). Commented Jan 27, 2019 at 18:25
  • used this to split words r"\b\W+\b" and this to mimic (not precisely) \b example r"(\b\W+\b)" Commented Jan 27, 2019 at 18:27
  • 1
    I got ['foobar', 'barbaz', 'bar'] with re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar' in Python 3.7. Commented Jan 27, 2019 at 22:11
  • Cool, guess that answers both. All possible with the bleeding edge (3.7). Commented Jan 28, 2019 at 9:24

1 Answer 1

2

In Python 3.7 re, you can split with zero-length matches:

Changed in version 3.7: Added support of splitting on a pattern that could match an empty string.

Also, note that

Empty matches for the pattern split the string only when not adjacent to a previous empty match.

>>> re.split(r'\b', 'Words, words, words.')
['', 'Words', ', ', 'words', ', ', 'words', '.']
>>> re.split(r'\W*', '...words...')
['', '', 'w', 'o', 'r', 'd', 's', '', '']

>>> re.split(r'(\W*)', '...words...') ['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', '']

Also, with

re.split(r'(?<!foo)(?=bar)', 'foobarbarbazbar')


I get ['foobar', 'barbaz', 'bar'] result in Python 3.7.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.