split python string without empty strings

Question

The following code:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('\n')
print(chunks)

Correctly prints out:

['Welcome', 'to', 'PythonExamples', 'Welcome', 'to', 'PythonExamples']

I want to split the string into strings that start with 'Welcome\n' so I have tried the following:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('Welcome\n')
print(chunks)

But this prints out:

['', 'to\nPythonExamples\n', 'to\nPythonExamples']

Notice how the first entry is empty. How can I split it up correctly so that the output is?

['to\nPythonExamples\n', 'to\nPythonExamples']

Just to clarify, the only thing that bothers you are the empty strings that might come up? — barshopen
– barshopen, Commented Feb 5, 2021 at 18:07
I assumed that because the empty string is there then I am parsing it incorrectly. I wonder why the empty value is there — Harry Boy
– Harry Boy, Commented Feb 5, 2021 at 18:10

barshopen · Accepted Answer · 2021-02-06 02:11:55Z

4

If I understand correctly you want to avoid empty strings. You can just use list comprehension, do this:

chunks = [x for x in str.split('Welcome\n') if x]

Should solve your problem. Why?

First of all, the list comprehension adds if x in the end, this means that it will include in the list only truthy values (or rather, will omit falsy values).

But why did you get '' in the first place? It would be the easier to point you at the source code for split:

while (maxcount-- > 0) {
    pos = FASTSEARCH(str+i, str_len-i, sep, sep_len, -1, FAST_SEARCH);
    if (pos < 0)
        break;
    j = i + pos;
    SPLIT_ADD(str, i, j);
    i = j + sep_len;
}

Basically, split function looks for the next occurrence of sep in split(sep) and derives a substring from last occurrence to pos(it would do it maxcount times). Since you got Welcome\n in pos 0 and your "last occurence" is 0, it will make a substring from 0 to 0 which results in an empty string.

By the way, you would also get empty string for such string:

'Welcome\nWelcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'

results for your code, without my change:

['', '', 'to\nPythonExamples\n', 'to\nPythonExamples']

edited Feb 6, 2021 at 2:11

answered Feb 5, 2021 at 18:10

barshopen

1,3612 gold badges17 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Harry Boy Over a year ago

Do you know why the first entry is empty ?

barshopen Over a year ago

Sure, let me add that to the answer

barshopen Over a year ago

@HarryBoy, I've added this part to my answer. Let me know if you have any more questions :) GL

Ryan Schaefer · Accepted Answer · 2021-02-05 18:41:26Z

2

You could filter out the empty entries. Also avoid using str as it is a builtin function. Since '' is falsy you don't even need a comparison.

inp = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = list(filter(None, inp.split('Welcome\n')))
print(chunks)

edited Feb 5, 2021 at 18:41

answered Feb 5, 2021 at 18:08

Ryan Schaefer

3,1101 gold badge29 silver badges48 bronze badges

3 Comments

alec_djinn Over a year ago

Using lambda in this case is really unnecessary. You can use filter(None, chunks) directly.

Ryan Schaefer Over a year ago

Hmm, it might be the case that if it is None it implicitly does lambda x: x as the function though.

Ryan Schaefer Over a year ago

Interesting it would be better to use none because it is using this in the source code github.com/python/cpython/blob/… so updating it for that improvement

alec_djinn · Accepted Answer · 2021-02-05 19:30:39Z

2

One very clean and Pythonic way would be using filter() with None. On the side, str is keyword in Python, you should not use it as a variable name.

text = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = text.split('Welcome\n')
chunks = filter(None, chunks)
print(list(chunks))
#['to\nPythonExamples\n', 'to\nPythonExamples']

edited Feb 5, 2021 at 19:30

answered Feb 5, 2021 at 18:16

alec_djinn

10.9k9 gold badges57 silver badges77 bronze badges

1 Comment

aneroid Over a year ago

chunks = list(filter(None, chunks)) otherwise chunks is printed as a filter object.

Collectives™ on Stack Overflow

split python string without empty strings

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related