1

I have strings like this example

"BODY: 88% RECYCLED POLYESTER, 12% ELASTANE GUSSET LINING: 91% COTTON, 9% ELASTANE EXCLUSIVE OF DECORATION"

And I want to split them so that a word with a colon starts a new list item, while keeping that colon word

["BODY: 77% RECYCLED POLYESTER, 23% ELASTANE", "MESH: 84% POLYAMIDE, 16% ELASTANE EXCLUSIVE OF DECORATION"]

I came up with

re.split("\s(\w+:.+)", p)

But this returns an empty string at the end and I'm not sure why

['BODY: 77% RECYCLED POLYESTER, 23% ELASTANE', 'MESH: 84% POLYAMIDE, 16% ELASTANE EXCLUSIVE OF DECORATION', '']
1

1 Answer 1

2

You can use re.split(r"\s(?=\w+:)", s). I added a lookahead ?= to ensure the split occurs only on the space character that has the \w+: pattern following it.

The original attempt includes the entire pattern in the split group leading to undesirable results (if you include multiple word: groups, you'll see there are bigger problems than just the trailing empty string).

Here's a comparison:

>>> s = "foo: bar bar baz: asdfa sdfasd quux: zzzz"
>>> #                ^                 ^
>>> # we want to split on the highlighted space characters above
>>>
>>> re.split(r"\s(\w+:.+)", s) # incorrect
['foo: bar bar', 'baz: asdfa sdfasd quux: zzzz', '']
>>> re.split(r"\s(?=\w+:)", s) # correct
['foo: bar bar', 'baz: asdfa sdfasd', 'quux: zzzz']

If you want to handle splitting on multiple spaces, you can use r"\s+(?=\w+:)".

Note also raw strings should be used for all regex literals to ensure nothing is inadvertently escaped.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.