Why does split gives extra empty string while splitting in python?

Question

Using examples to demonstrate my doubt

example = "$2000"  
example.split("$")  
['', '2000']

but if i do

example2 = "2000$3000"
example2.split("$")
['2000', '3000']

why there is not extra empty string in this example ?
how split is working behind the scene ?

It's not relevant to your question, but it's hard to believe your first example. Shouldn't you be getting ['', '2000'], not ['', ' 2000']? Please always copy and paste transcripts exactly. — DSM
– DSM, Commented Feb 16, 2015 at 5:29
I think it makes sense that when you split a string containing a single delimiter, you get two pieces. — Martin Konecny
– Martin Konecny, Commented Feb 16, 2015 at 5:37

paxdiablo · Accepted Answer · 2015-02-16 06:16:37Z

3

Because you split on a separator. If you split the string $2000 with the $ separator, there is an empty string on the left and 2000 on the right:

            $2000
nothing____/ \____2000

With the second case of 2000$3000, there is still only one separator so it still produces two values in the array. It's just that the value left of the separator is 2000 rather than an empty string:

     2000$3000
2000____/ \____3000

Provided you don't limit the split by specifying the maximum number of splits allowed, the resulting array size should always be one more than the number of separators.

If you want to remove all empty strings from the resultant collection, you can do it with list comprehension, the third segment below:

>>> s = '$$$1000$$2000$3000$$$'           # test data

>>> [x for x in s.split('$') if x != '']  # remove all empty strings
['1000', '2000', '3000']

There are other ways to get rid of blanks at just the ends as well, either one or all:

>>> import re
>>> s='$$$1000$$2000$3000$$$'

>>> re.sub('^\$|\$$','',s).split('$')         # just one
['', '', '1000', '', '2000', '3000', '', '']

>>> re.sub('^\$*|\$*$','',s).split('$')       # all at the ends
['1000', '', '2000', '3000']

edited Feb 16, 2015 at 6:16

answered Feb 16, 2015 at 5:38

paxdiablo

888k243 gold badges1.6k silver badges2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

smac89 Over a year ago

What about this re.sub(r'\$+', " ", s).split()

G.T. Over a year ago

I think this answer should be accepted.

Kacy · Accepted Answer · 2015-02-16 05:39:31Z

2

From the docs: https://docs.python.org/2/library/string.html

It (the argument to the function) specifies a string to be used as the word separator. The returned list will then have one more item than the number of non-overlapping occurrences of the separator in the string.

The number of occurrences of the separator in your example is 1. Therefore split will return 2 elements. The 1st element must be the empty string since there's nothing before the separator.

Your second example doesn't have an empty string in the returned result because 2000 came before the separator $.

You can think of the split function as slicing a string into array elements wherever the separator occurs.

edited Feb 16, 2015 at 5:39

answered Feb 16, 2015 at 5:34

Kacy

3,4506 gold badges34 silver badges61 bronze badges

Collectives™ on Stack Overflow

Why does split gives extra empty string while splitting in python?

2 Answers 2

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related