Python split function behaviour

Question

a = "1)2"
b = ")"
a = a.split(")")
b = b.split(")")
print(a, len(a), b, len(b))

returns

['1', '2'] 2 ['', ''] 2

This behaviour seems really strange to me. Why are blanks returned only for b and not a?

When you split b on the ")", there is nothing to the left or right of the ")", so you get empty strings. — not_speshal
– not_speshal, Commented Aug 19, 2021 at 15:02
For a, you have a 1 & 2 around the (. For b, it's surrounded by nothing => Blank on each side. — Metapod
– Metapod, Commented Aug 19, 2021 at 15:02
What do you think the result should be instead, and why? What do you think .split does, and how do you think it should handle the case where the delimiter appears at the beginning or end of the string? Also, did you try reading the documentation? What did it tell you about this? — Karl Knechtel
– Karl Knechtel, Commented Aug 19, 2021 at 15:05
You can see that another way: your_separator.join([the split parts]) will always give you the original string. In b, your separator joins two empty strings, before and after itself. In a, it joins "1" and "2" — Thierry Lathuille
– Thierry Lathuille, Commented Aug 19, 2021 at 15:05
Questions asking "why" are not a good fit for Stack Overflow. We can try to give reasons why the decision that was made might make certain programming tasks easier or harder, but it's still very subjective. Ultimately, the reasoning is in the minds of the inventors - in this case, of the Python language. For this reason I am voting to close the question as opinion-based. — Karl Knechtel
– Karl Knechtel, Commented Aug 19, 2021 at 15:06

Timur Shtatland · Accepted Answer · 2021-08-19 15:33:08Z

As was pointed out by others, the documented behavior of str.split explains your results. Since you specify sep to be ')', split looks for the strings that surround it, and in the case of ')', finds exactly 2 empty strings (not blanks). In the case of '1)2', split finds 2 non-empty strings ('1' and '2'). Note that this behavior is extended to other similar cases, see below. As you can see, split, when provided with sep, returns empty strings in cases when the sep occur consecutively, or at the beginning or the end of a string.

lst = ['1', ')', '1)', ')2', '1)2', '1)2)', '))', ')1)2)']

for s in lst:
    s_split = s.split(')')
    print(f'"{s}" is split into\t{len(s_split)} element(s):\t', s_split)

Prints:

"1" is split into       1 element(s):    ['1']
")" is split into       2 element(s):    ['', '']
"1)" is split into      2 element(s):    ['1', '']
")2" is split into      2 element(s):    ['', '2']
"1)2" is split into     2 element(s):    ['1', '2']
"1)2)" is split into    3 element(s):    ['1', '2', '']
"))" is split into      3 element(s):    ['', '', '']
")1)2)" is split into   4 element(s):    ['', '1', '2', '']

Naga Deepak Kaza · Accepted Answer · 2021-08-19 15:21:34Z

0

That's because in the first case ( is encountered at index 1, so the result of split method will be [a[0:1],a[2:]]

Whereas in the first case ( is encountered at index 0 so split will return [a[0:0],a[0:]]

If you are still confused, consider a string s = "(12(3("

Here ( is encountered at 3 indices 0,3 and 5 so split method returns [s[0:0],s[0+1:3],s[3+1:5],s[5+1:]]

Note: The first and last elements will be something like s[0:i] and s[j:-1] respectively

answered Aug 19, 2021 at 15:21

Naga Deepak Kaza

1

Collectives™ on Stack Overflow

Python split function behaviour

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related