4

Really confusing about these two methods, having looked into the API doc of split (String, int) , still not sure when empty space will emerge.

For example

String s="boo:and:foo"
s.split("o", -2);
//the result, according to doc is { "b", "", ":and:f", "", "" }
//why there is only one "" string in the front, while two "" in the back?

My thought after some testing is whenever there is consecutive match, in this case oo, there will be an additional "" , for the last trailing "" It is because of a match right before the end of String + consecutive match, resulting in two (the match right before the start of string will also result in a leading "")

Some confirmation needed.

4
  • because if you pass a negative (or greater then the amount of elements it would split) value to the split(String,int) method it will also include the trailing empty string, which is why foo creates two empty strings. Commented Nov 23, 2015 at 7:41
  • why two trailing "" then, any elabration? I understand why there should have, what I want to confirm is why there is two "" in the end. Commented Nov 23, 2015 at 7:45
  • " The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression OR is terminated by the end of the string." (added caps for emphasis). Commented Nov 23, 2015 at 7:47
  • 1
    try to split foo food logical by the parameter. It would be "f" , "", "d". Now lets go to foo. This would, like food create three elements "f", "", "" with two empty strings. Since in your case foo is the last element the normal split(String) method would exclude the traling empty string element. By providing a negative integer you just make sure that these are getting included. Commented Nov 23, 2015 at 7:49

2 Answers 2

2

First of, lets start by checking what the logical result of a normal split operation would be. Lets take one:two:three:four as an example. You would expect, if you would split by :, to get the following result.

one
two
three
four

Lets add an other delimiter to the end of the String. Now it looks like one:two:three:four:. If we would be splitting this now, we would get the same result. This is due to the split(String) method defintion, which leaves out trailing empty strings. The same would happen if you would add more delimiters to the end of the string one:two:three:four::::. The result by using split(String) would be the same again, since like in the previous example it would exclude trailing empty string elements.

If you would like to include these elements, then you could use the split(String,int) function, which, if you provide a negative int, would include empty trailing strings. from the documentation of the method for the parameter limit, which is the int.

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

by passing it as negative we would get, for the string one:two:three:four::, the following result:

one
two
three
four
""
""

The same is happening for you example now. It does notice, that the word foo would create two empty trailing String elements at the end of the array, and will include them.

Sign up to request clarification or add additional context in comments.

Comments

0

I will answer your question by int limit parameter, let's say this parameter is limit n. But, you need to understand on how to split a String manually as I made on this table.

split

I use "o" as regular expression (regex) to split the String. Now, look at cell D2 and E2, and cell L2 and M2. There is a null character between "o" and "o", it is "".

Because of the regular expression exists on the last String characters (cell M2), add single "" to behind of the String characters. Likewise if the regular expression exists on the first String characters, add add single "" to the first characters of the String.

So the String array will be:

string[0] ---> contains ---> "b"
string[1] ---> contains ---> ""
string[2] ---> contains ---> ":and:f"
string[3] ---> contains ---> ""
string[4] ---> contains ---> ""

Let's name this array with splitted_string.


Zero limit

split(regex, 0) is equivalent with split(String regex) and doesn't include the last String characters that are empty. If you look at the table, you can see that cell L4 and N4 are empty String, so it will be ignored. For example:

String str = "boo:and:foo";
String[] a = str.split("o", 0);

for (String x : a)
    System.out.println("split(\"o\", 0)\t = " + x);

// Will give the same result with:
String[] b = str.split("o");
for (String x : b)
    System.out.println("split(\"o\")\t= " + x);

OUTPUT

split("o", 0)    = b
split("o", 0)    = 
split("o", 0)    = :and:f
split("o")  = b
split("o")  = 
split("o")  = :and:f

Positive limit

From the docs:

If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.

Example 1:

String str = "boo:and:foo";

String[] a = str.split("o", 2);
System.out.println("Length = " + a.length);
for (String x : a)
    System.out.println("split(\"o\", 2)\t = " + x);

OUTPUT

Length = 2
split("o", 2)   = b
split("o", 2)   = o:and:foo

The pattern is applied n - 1 times, i.e. 2 - 1 = 1 times. Means that only the first "o" will be removed.

Example 2:

String[] b = str.split("o", 7);
System.out.println("Length = " + b.length);
for (String x : b)
    System.out.println("split(\"o\", 7)\t = " + x);

OUTPUT

Length = 5
split("o", 7)   = b
split("o", 7)   = 
split("o", 7)   = :and:f
split("o", 7)   = 
split("o", 7)   = 

There are only four characters "o" to be removed, but the pattern is applied n - 1 times, i.e. 7 - 1 = 6 times. Because of the application of this pattern is greater than total of characters "o", so that all characters "o" will be removed.

Unlike split(regex, 0) and split(regex), the result may be followed by empty String, i.e. "".


Negative limit

From the docs:

If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.

If you enter any negative integer as the limit, the pattern will be applied as split(regex, 0) or split(regex), but includes all empty String characters. For example:

String str = "boo:and:foo";
String[] a = str.split("o", -1);
System.out.println("Length = " + a.length);
for (String x : a)
    System.out.println("split(\"o\", -1)\t = " + x);

OUTPUT

Length = 5
split("o", -1)  = b
split("o", -1)  = 
split("o", -1)  = :and:f
split("o", -1)  = 
split("o", -1)  = 

If you are still confused, look at the table and splitted_string above again.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.