4

The following code:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('\n')
print(chunks)

Correctly prints out:

['Welcome', 'to', 'PythonExamples', 'Welcome', 'to', 'PythonExamples']

I want to split the string into strings that start with 'Welcome\n' so I have tried the following:

str = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = str.split('Welcome\n')
print(chunks)

But this prints out:

['', 'to\nPythonExamples\n', 'to\nPythonExamples']

Notice how the first entry is empty. How can I split it up correctly so that the output is?

['to\nPythonExamples\n', 'to\nPythonExamples']
3
  • remove the empty string via a comphrension? Commented Feb 5, 2021 at 18:07
  • Just to clarify, the only thing that bothers you are the empty strings that might come up? Commented Feb 5, 2021 at 18:07
  • I assumed that because the empty string is there then I am parsing it incorrectly. I wonder why the empty value is there Commented Feb 5, 2021 at 18:10

3 Answers 3

4

If I understand correctly you want to avoid empty strings. You can just use list comprehension, do this:

chunks = [x for x in str.split('Welcome\n') if x]

Should solve your problem. Why?

First of all, the list comprehension adds if x in the end, this means that it will include in the list only truthy values (or rather, will omit falsy values).

But why did you get '' in the first place? It would be the easier to point you at the source code for split:

while (maxcount-- > 0) {
    pos = FASTSEARCH(str+i, str_len-i, sep, sep_len, -1, FAST_SEARCH);
    if (pos < 0)
        break;
    j = i + pos;
    SPLIT_ADD(str, i, j);
    i = j + sep_len;
}

Basically, split function looks for the next occurrence of sep in split(sep) and derives a substring from last occurrence to pos(it would do it maxcount times). Since you got Welcome\n in pos 0 and your "last occurence" is 0, it will make a substring from 0 to 0 which results in an empty string.

By the way, you would also get empty string for such string:

'Welcome\nWelcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'

results for your code, without my change:

['', '', 'to\nPythonExamples\n', 'to\nPythonExamples']

Sign up to request clarification or add additional context in comments.

3 Comments

Do you know why the first entry is empty ?
Sure, let me add that to the answer
@HarryBoy, I've added this part to my answer. Let me know if you have any more questions :) GL
2

You could filter out the empty entries. Also avoid using str as it is a builtin function. Since '' is falsy you don't even need a comparison.

inp = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = list(filter(None, inp.split('Welcome\n')))
print(chunks)

3 Comments

Using lambda in this case is really unnecessary. You can use filter(None, chunks) directly.
Hmm, it might be the case that if it is None it implicitly does lambda x: x as the function though.
Interesting it would be better to use none because it is using this in the source code github.com/python/cpython/blob/… so updating it for that improvement
2

One very clean and Pythonic way would be using filter() with None. On the side, str is keyword in Python, you should not use it as a variable name.

text = 'Welcome\nto\nPythonExamples\nWelcome\nto\nPythonExamples'
chunks = text.split('Welcome\n')
chunks = filter(None, chunks)
print(list(chunks))
#['to\nPythonExamples\n', 'to\nPythonExamples']

1 Comment

chunks = list(filter(None, chunks)) otherwise chunks is printed as a filter object.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.