1

I want to split some text by these delimiters: ",", ";", " y " (whitespace is necessary)

It should also ignore any delimiters within parentheses

Here's what I've tried for the first two:

re.split('[,;]+(?![^(]*\))', text_spam)

'foo, bar; baz spam y eggs guido' should split into ['foo', ' bar', ' baz spam', 'eggs guido']

I can't figure out how to include a multicharacter string inside the set to get the last delimiter.

TIA

2
  • 1
    r'(?:[,;]| y )+(?![^(]*\))'? Did you try alternation? Commented Nov 25, 2019 at 10:20
  • Can you give an example of a line you want to split, and your desired result? Commented Nov 25, 2019 at 10:26

1 Answer 1

4

You may consider using a non-capturing group with an alternation operator | to introduce a multi-character string as an alternative to a character set, and set the + modifier to the group:

r'(?:[,;]| y )+(?![^(]*\))'

See the regex demo

You may further strip the items you get and omit any empty items using

import re
text = "foo, bar; baz spam y eggs guido (foo, bar; baz spam y eggs guido)"
results = re.split(r'(?:[,;]\s*| y )+(?![^(]*\))', text)
print( list(filter(None, [x.strip() for x in results])) )
# => ['foo', 'bar', 'baz spam', 'eggs guido (foo, bar; baz spam y eggs guido)']

See the Python demo

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.