2

I need to split a string. I am using this:

def ParseStringFile(string):
p = re.compile('\W+')
result = p.split(string)

But I have an error: my result has two empty strings (''), one before 'Лев'. How do I get rid of them?

enter image description here

2
  • 2
    No. It works correctly. The empty string is due to the extra new line at the beginning of the string. Commented Feb 21, 2014 at 21:52
  • nhahtdh I need to delete first and last empty (' ') elements of list, before using split? Commented Feb 21, 2014 at 22:01

2 Answers 2

5

As nhahtdh pointed out, the empty string is expected since there's a \n at the start and end of the string, but if they bother you, you can filter them very quickly and efficiently.

>>> filter(None, ['', 'text', 'more text', ''])
['text', 'more text']

filter usually takes a callable function as first argument and creates a list with all elements removed for which function(element) returns False. Here None is given, which triggers a special case: The element is removed if bool(element) is false. As bool('') is false, it gets removed.

Also see the manual.

Sign up to request clarification or add additional context in comments.

Comments

2

You could remove all newlines from the string before matching it:

p.split(string.strip('\n'))

Alternatively, split the string and then remove the first and last element:

result = p.split(string)[1:-1]

The [1:-1] takes a copy of the result and includes all indexes starting at 1 (i.e. removing the first element), and ending at -2 (i.e. the second to last element. The second index is exclusive)

A longer and less elegant alternative would be to modify the list in-place:

result = p.split(string)
del result[-1]   # remove last element
del result[0]    # remove first element

Note that in these two solutions the first and last element must be the empty string. If sometimes the input doesn't contain these empty strings at the beginning or end, then they will misbehave. However they are also the fastest solutions.

If you want to remove all empty strings in the result, even if they happen inside the list of results you can use a list-comprehension:

[word for word in p.split(string) if word]

9 Comments

One of the few instances where filter actually beats out list comps actually. stackoverflow.com/questions/3845423/…
@SlaterTyranus I doubt speed matters in this case, but readability does and I prefer the list-comprehension. Also in python3 filter doesn't produce a list, which might or might not be what the OP wants. Also, if speed matters using the [1:-1] is much faster because it avoids all the truth tests altogether.
Reasonable, just thought you might like to know. As someone who generally thinks things like filterand map should rarely be used, this is one case where I've really got to argue for the filtersolution. 5x speed increase, and intuitively, you are filtering the list, but both are certainly accurate solutions. [1:-1] seems dangerously brittle to me.
@SlaterTyranus That answer is from 2010. On my machine I get quite different results, although filter with None is (obviously) still the fastest (By about 2.8x on python2 and about 85% on python3). It seems like during the last 4 years there was a pretty good job in optimizing the interpreter.
What about speed in filter solution? I don't want use [1;-1] solution becouse text don't have to have ' ' symbols, all sites various
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.