Why does "split" on an empty string return a non-empty array?

Question

Split on an empty string returns an array of size 1 :

scala> "".split(',')
res1: Array[String] = Array("")

Consider that this returns empty array:

scala> ",,,,".split(',')
res2: Array[String] = Array()

Please explain :)

Additionally, it seems inconsistent with the behavior observed when the string contains only one instance of the separator. In this case the result is effectively an empty array: ",".split(",").length == 0 — LD.
– LD., Commented Feb 8, 2013 at 23:14
It's a nasty edge case. Google's ErrorProne bug prevention suite bans String.split(s) for exactly this reason. I have written more in my answer below. — Rok Kralj
– Rok Kralj, Commented Jun 11 at 10:44
works as specified: "If the expression does not match any part of the input then the resulting array has just one element, namely this string. ... trailing empty strings will be discarded" - Why? must be asked to specification author — user85421
– user85421, Commented Jun 11 at 12:34

Sam Stainsby · Accepted Answer · 2011-02-11 04:27:00Z

83

If you split an orange zero times, you have exactly one piece - the orange.

answered Feb 11, 2011 at 4:27

Sam Stainsby

1,6141 gold badge11 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nick Rolando Over a year ago

But the orange isn't empty (idk if thats what oluies meant), its an orange. Maybe splitting an orange that should be there, but is not, so you get back a single value: an empty space xD

user195488 Over a year ago

This is a deep conversation.

Matchu Over a year ago

This metaphor makes sense for "orange".split(','), but isn't obviously relevant for splitting empty strings. If I split my lack of orange zero times, I still have no orange; do we represent that as an empty list of no-oranges, a list of exactly one no-orange, a list of twelve no-oranges, or what? It's not a question of what we end up with, but how we represent it.

SMUsamaShah Over a year ago

But if you split a non-existent book by its pages, you will get nothing.

Ardent Coder Over a year ago

Hm... what is 0/0?

Rok Kralj · Accepted Answer · 2025-06-11 10:42:01Z

72

The Java and Scala split methods operate in two steps like this:

First, split the string by delimiter. The natural consequence is that if the string does not contain the delimiter, the result is an array of size 1 containing the original string,
Second, remove all the rightmost empty strings. This is the reason ",,,".split(",") returns empty array.

According to this, the result of "".split(",") should be an empty array because of the second step, right?

It should be. Unfortunately, this is a nasty edge case. This is not just my personal opinion, but Google's ErrorProne suite bans this usage because of it.. Moreover, they tried to fix the behavior in JDK bug 6559590, but couldn't, because too much broken code depends on old behavior.

The old Android documentation in java.util.regex.Pattern used to be very explicit about it:

For n == 0, the result is as for n < 0, except trailing empty strings will not be returned. (Note that the case where the input is itself an empty string is special, as described above, and the limit parameter does not apply there.)

Unfortunately, the OpenJDK docs are very unclear, and should be improved in my opinion.

Solution 1: Always pass -1 as the second parameter

So, I advise you to always pass n == -1 as the second parameter (this will skip step two above), unless you specifically know what you want to achieve / you are sure that the empty string is not something that your program would get as an input.

Solution 2: Use Guava Splitter class

If you are already using Guava in your project, you can try the Splitter (documentation) class. It has a very rich API, and makes your code very easy to understand.

Splitter.on("").omitEmptyStrings().split(",") // correctly empty array
Splitter.on(".").split(".a.b.c.") // "", "a", "b", "c", ""
Splitter.on(",").omitEmptyStrings().split("a,,b,,c") // "a", "b", "c"
Splitter.on(CharMatcher.anyOf(",.")).split("a,b.c") // "a", "b", "c"
Splitter.onPattern("=>?").split("a=b=>c") // "a", "b", "c"
Splitter.on(",").limit(2).split("a,b,c") // "a", "b,c"

edited Jun 11 at 10:42

answered Jun 13, 2016 at 18:13

Rok Kralj

49k10 gold badges75 silver badges80 bronze badges

7 Comments

Yogu Over a year ago

+1, this is the only answer that actually cites the documentation and points out that it is inconsistent. However, I did not find the highlighted part of the comment in my JavaDoc.

Rok Kralj Over a year ago

I have found it in java.util.regex.Pattern, but it seems to mostly be gone. At the time of writing, it definitely was present in the official OpenJDK source tree as a javadoc. android.googlesource.com/platform/libcore/+/… Maybe we should report a bug?

Yogu Over a year ago

Would be a good idea to report a bug - the behaviour will definitely not be changed, but it should at least be documented.

lxgr Over a year ago

@RokKralj Android did not use the OpenJDK library, but was instead based on Apache Harmony, so maybe you are looking in the wrong place?

simon.watts Over a year ago

"".split (",", n) generates a one element array for n in (-1, 0, 1) with Oracle JDK 8. Would be nice to get a list of non-empty tokens only -- guess a full regex may be necessary (something like "[^,\\s]+[^,]*[^,\\s]*").

|

Matt Fenwick · Accepted Answer · 2012-04-18 19:16:07Z

39

Splitting an empty string returns the empty string as the first element. If no delimiter is found in the target string, you will get an array of size 1 that is holding the original string, even if it is empty.

edited Apr 18, 2012 at 19:16

Matt Fenwick

49.3k24 gold badges130 silver badges198 bronze badges

answered Feb 11, 2011 at 0:55

Nick Rolando

26.2k13 gold badges84 silver badges120 bronze badges

Comments

Daniel C. Sobral · Accepted Answer · 2011-02-11 19:33:22Z

36

For the same reason that

",test" split ','

and

",test," split ','

will return an array of size 2. Everything before the first match is returned as the first element.

edited Feb 11, 2011 at 19:33

answered Feb 11, 2011 at 1:52

Daniel C. Sobral

298k88 gold badges508 silver badges688 bronze badges

13 Comments

Daniel C. Sobral Over a year ago

@Nicklamort It seems self-evident to me, but you can look the Javadocs for String's split if you need more information.

Austin Over a year ago

@Raphael Or in an Oracle database

Andrey Mikhaylov - lolmaus Over a year ago

@Raphael, in any other programming language "".split("wtf").length returns 0. Only in JS it's 1. :/

Joan Over a year ago

@DanielC.Sobral Ok, so why "," split "," returns an array of 0 ?

Didier A. Over a year ago

Why isn't everything after the last match returned too?

|

Saad · Accepted Answer · 2015-12-01 19:44:45Z

24

"a".split(",") -> "a" therefore "".split(",") -> ""

edited Dec 1, 2015 at 19:44

Saad

54.7k22 gold badges79 silver badges114 bronze badges

answered Apr 15, 2013 at 11:06

weberjn

1,98320 silver badges25 bronze badges

1 Comment

Rok Kralj Over a year ago

Wrong. Split removes all the rightmost empty strings, therefore the result should be an empty array. See my answer. ",".split(",") returns empty array.

brent777 · Accepted Answer · 2011-02-11 00:57:50Z

5

In all programming languages I know a blank string is still a valid String. So doing a split using any delimiter will always return a single element array where that element is the blank String. If it was a null (not blank) String then that would be a different issue.

answered Feb 11, 2011 at 0:57

brent777

3,3792 gold badges29 silver badges36 bronze badges

1 Comment

oluies Over a year ago

I think this is a library function and not a part of the language. For example in google guava you could omit empty strings. >Iterable<String> pieces = com.google.common.base.Splitter.on(',').omitEmptyStrings().split("");

Andy Hayden · Accepted Answer · 2017-10-20 04:47:55Z

2

This split behavior is inherited from Java, for better or worse...
Scala does not override the definition from the String primitive.

Note, that you can use the limit argument to modify the behavior:

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

i.e. you can set the limit=-1 to get the behavior of (all?) other languages:

@ ",a,,b,,".split(",")
res1: Array[String] = Array("", "a", "", "b")

@ ",a,,b,,".split(",", -1)  // limit=-1
res2: Array[String] = Array("", "a", "", "b", "", "")

It's seems to be well-known the Java behavior is quite confusing but:

The behavior above can be observed from at least Java 5 to Java 8.

There was an attempt to change the behavior to return an empty array when splitting an empty string in JDK-6559590. However, it was soon reverted in JDK-8028321 when it causes regression in various places. The change never makes it into the initial Java 8 release.

Note: The split method wasn't in Java from the beginning (it's not in 1.0.2) but actually is there from at least 1.4 (e.g. see JSR51 circa 2002). I am still investigating...

What's unclear is why Java chose this in the first place (my suspicion is that it was originally an oversight/bug in an "edge case"), but now irrevocably baked into the language and so it remains.

answered Oct 20, 2017 at 4:47

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

2 Comments

DaveyDaveDave Over a year ago

I'm not sure that this answers the question - while it may be true for the example given here, it doesn't help with the case of the empty string - "".split(",") still returns a single element array like [""].

Andy Hayden Over a year ago

@DaveyDaveDave that's expected behavior of every other language. The ",,,," is the bizarre/different behavior in Scala, and disparate to the "" case.

Hanan Oanunu · Accepted Answer · 2018-10-04 10:19:51Z

0

Empty string have no special status while splitting a string. You may use:

Some(str)
  .filter(_ != "")
  .map(_.split(","))
  .getOrElse(Array())

answered Oct 4, 2018 at 10:19

Hanan Oanunu

1792 silver badges3 bronze badges

Comments

Burak Senel · Accepted Answer · 2022-02-11 14:58:04Z

0

use this Function,

public static ArrayList<String> split(String body) {
    return new ArrayList<>(Arrays.asList(Optional.ofNullable(body).filter(a->!a.isEmpty()).orElse(",").split(",")));
}

answered Feb 11, 2022 at 14:58

Burak Senel

1212 silver badges7 bronze badges

Collectives™ on Stack Overflow

Why does "split" on an empty string return a non-empty array?

9 Answers 9

5 Comments

Solution 1: Always pass -1 as the second parameter

Solution 2: Use Guava Splitter class

7 Comments

Comments

13 Comments

1 Comment

1 Comment

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

5 Comments

Solution 1: Always pass -1 as the second parameter

Solution 2: Use Guava Splitter class

7 Comments

Comments

13 Comments

1 Comment

1 Comment

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related