2
a = "1)2"
b = ")"
a = a.split(")")
b = b.split(")")
print(a, len(a), b, len(b))

returns

['1', '2'] 2 ['', ''] 2

This behaviour seems really strange to me. Why are blanks returned only for b and not a?

5
  • 3
    When you split b on the ")", there is nothing to the left or right of the ")", so you get empty strings. Commented Aug 19, 2021 at 15:02
  • For a, you have a 1 & 2 around the (. For b, it's surrounded by nothing => Blank on each side. Commented Aug 19, 2021 at 15:02
  • 1
    What do you think the result should be instead, and why? What do you think .split does, and how do you think it should handle the case where the delimiter appears at the beginning or end of the string? Also, did you try reading the documentation? What did it tell you about this? Commented Aug 19, 2021 at 15:05
  • You can see that another way: your_separator.join([the split parts]) will always give you the original string. In b, your separator joins two empty strings, before and after itself. In a, it joins "1" and "2" Commented Aug 19, 2021 at 15:05
  • 1
    Questions asking "why" are not a good fit for Stack Overflow. We can try to give reasons why the decision that was made might make certain programming tasks easier or harder, but it's still very subjective. Ultimately, the reasoning is in the minds of the inventors - in this case, of the Python language. For this reason I am voting to close the question as opinion-based. Commented Aug 19, 2021 at 15:06

2 Answers 2

3

As was pointed out by others, the documented behavior of str.split explains your results. Since you specify sep to be ')', split looks for the strings that surround it, and in the case of ')', finds exactly 2 empty strings (not blanks). In the case of '1)2', split finds 2 non-empty strings ('1' and '2'). Note that this behavior is extended to other similar cases, see below. As you can see, split, when provided with sep, returns empty strings in cases when the sep occur consecutively, or at the beginning or the end of a string.

lst = ['1', ')', '1)', ')2', '1)2', '1)2)', '))', ')1)2)']

for s in lst:
    s_split = s.split(')')
    print(f'"{s}" is split into\t{len(s_split)} element(s):\t', s_split)

Prints:

"1" is split into       1 element(s):    ['1']
")" is split into       2 element(s):    ['', '']
"1)" is split into      2 element(s):    ['1', '']
")2" is split into      2 element(s):    ['', '2']
"1)2" is split into     2 element(s):    ['1', '2']
"1)2)" is split into    3 element(s):    ['1', '2', '']
"))" is split into      3 element(s):    ['', '', '']
")1)2)" is split into   4 element(s):    ['', '1', '2', '']
Sign up to request clarification or add additional context in comments.

Comments

0

That's because in the first case ( is encountered at index 1, so the result of split method will be [a[0:1],a[2:]]

Whereas in the first case ( is encountered at index 0 so split will return [a[0:0],a[0:]]

If you are still confused, consider a string s = "(12(3("

Here ( is encountered at 3 indices 0,3 and 5 so split method returns [s[0:0],s[0+1:3],s[3+1:5],s[5+1:]]

Note: The first and last elements will be something like s[0:i] and s[j:-1] respectively

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.