18

I want to split a string containing irregularly repeating delimiter, like method split() does:

>>> ' a b   c  de  '.split()
['a', 'b', 'c', 'de']

However, when I apply split by regular expression, the result is different (empty strings sneak into the resulting list):

>>> re.split('\s+', ' a b   c  de  ')
['', 'a', 'b', 'c', 'de', '']
>>> re.split('\.+', '.a.b...c..de..')
['', 'a', 'b', 'c', 'de', '']

And what I want to see:

>>>some_smart_split_method('.a.b...c..de..')
['a', 'b', 'c', 'de']
1

2 Answers 2

26

The empty strings are just an inevitable result of the regex split (though there is good reasoning as to why that behavior might be desireable). To get rid of them you can call filter on the result.

results = re.split(...)
results = list(filter(None, results))

Note the list() transform is only necessary in Python 3 -- in Python 2 filter() returns a list, while in 3 it returns a filter object.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way then to get split with limited number of splits? >>>split('.a.b...c', 1) ['a', 'b...c'] >>>split('a.b...c', 1) ['a', 'b...c']
Yes, actually, pretty much exactly as you wrote it. For example, split(regex, string, 1) will stop splitting after the first match. You can see more in the python regex docs, here.
21
>>> re.findall(r'\S+', ' a b   c  de  ')
['a', 'b', 'c', 'de']

1 Comment

That's awesome solution, a lot of posts are suggesting to use groups instead, but findall does everything.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.