157

How to split the string "Thequickbrownfoxjumps" to substrings of equal size in Java. Eg. "Thequickbrownfoxjumps" of 4 equal size should give the output.

["Theq","uick","brow","nfox","jump","s"]

Similar Question:

Split string into equal-length substrings in Scala

4
  • 4
    What did you try? Why did that not work? Commented Sep 21, 2010 at 12:18
  • 2
    Do you need to use a regex for this? Just asking because of the regex tag... Commented Sep 21, 2010 at 12:19
  • @Thilo link he posted is for Scala, he is asking about same in Java Commented Sep 21, 2010 at 12:20
  • @Thilo:I was asking what how to do it in java ,like the answer given for scala. Commented Sep 21, 2010 at 12:27

24 Answers 24

276

Here's the regex one-liner version:

System.out.println(Arrays.toString(
    "Thequickbrownfoxjumps".split("(?<=\\G.{4})")
));

\G is a zero-width assertion that matches the position where the previous match ended. If there was no previous match, it matches the beginning of the input, the same as \A. The enclosing lookbehind matches the position that's four characters along from the end of the last match.

Both lookbehind and \G are advanced regex features, not supported by all flavors. Furthermore, \G is not implemented consistently across the flavors that do support it. This trick will work (for example) in Java, Perl, .NET and JGSoft, but not in PHP (PCRE), Ruby 1.9+ or TextMate (both Oniguruma). JavaScript's /y (sticky flag) isn't as flexible as \G, and couldn't be used this way even if JS did support lookbehind.

I should mention that I don't necessarily recommend this solution if you have other options. The non-regex solutions in the other answers may be longer, but they're also self-documenting; this one's just about the opposite of that. ;)

Also, this doesn't work in Android, which doesn't support the use of \G in lookbehinds.

Sign up to request clarification or add additional context in comments.

16 Comments

In PHP 5.2.4 works following code: return preg_split('/(?<=\G.{'.$len.'})/u', $str,-1,PREG_SPLIT_NO_EMPTY);
For the record, using String.substring() instead of a regex, while requiring a few extra lines of code, will run somewhere on the order of 5x faster...
In Java this does not work for a string with newlines. It will only check up to the first newline, and if that newline happens to be before the split-size, then the string will not be split. Or have I missed something?
For the sake of completeness: splitting text over multilines needs a prefixed (?s) in the regex: (?s)(?<=\\G.{4}).
@JeffreyBlattman I doubt that you got the exception at compile time
|
161

Well, it's fairly easy to do this with simple arithmetic and string operations:

public static List<String> splitEqually(String text, int size) {
    // Give the list the right capacity to start with. You could use an array
    // instead if you wanted.
    List<String> ret = new ArrayList<String>((text.length() + size - 1) / size);

    for (int start = 0; start < text.length(); start += size) {
        ret.add(text.substring(start, Math.min(text.length(), start + size)));
    }
    return ret;
}

Note: this assumes a 1:1 mapping of UTF-16 code unit (char, effectively) with "character". That assumption breaks down for characters outside the Basic Multilingual Plane, such as emoji, and (depending on how you want to count things) combining characters.

I don't think it's really worth using a regex for this.

EDIT: My reasoning for not using a regex:

  • This doesn't use any of the real pattern matching of regexes. It's just counting.
  • I suspect the above will be more efficient, although in most cases it won't matter
  • If you need to use variable sizes in different places, you've either got repetition or a helper function to build the regex itself based on a parameter - ick.
  • The regex provided in another answer firstly didn't compile (invalid escaping), and then didn't work. My code worked first time. That's more a testament to the usability of regexes vs plain code, IMO.

16 Comments

@Jon Skeet : Thanks for clearing it but i didn't get your point. "I don't think it's really worth using a regex for this"
@Emil: If you want a one-liner for splitting the string, I'd recommend Guava's Splitter.fixedLength(4) as suggested by seanizer.
@Jay:come-on you need not be that sarcastic.I'm sure it can be done using regex in just one-line.A fixed length sub-string is also a pattern.What do you say about this answer. stackoverflow.com/questions/3760152/… .
@Emil: I didn't intend that to be rude, just whimsical. The serious part of my point was that while yes, I'm sure you could come up with a Regex to do this -- I see Alan Moore has one that he claims works -- it is cryptic and therefore difficult for a later programmer to understand and maintain. A substring solution can be intuitive and readable. See Jon Skeet's 4th bullet: I agree with that 100%.
@JonSkeet I wouldn’t call this solution “brute force”, as it doesn’t do anything worse than other solutions. In fact, it calculates the substring boundaries directly, whereas the regex solution will actually iterate over the characters, to find a “match” at the intended positions. So if any of the posted solutions is “brute force”, it’s the regex variant.
|
85

This is very easy with Google Guava:

for(final String token :
    Splitter
        .fixedLength(4)
        .split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

Output:

Theq
uick
brow
nfox
jump
s

Or if you need the result as an array, you can use this code:

String[] tokens =
    Iterables.toArray(
        Splitter
            .fixedLength(4)
            .split("Thequickbrownfoxjumps"),
        String.class
    );

Reference:

Note: Splitter construction is shown inline above, but since Splitters are immutable and reusable, it's a good practice to store them in constants:

private static final Splitter FOUR_LETTERS = Splitter.fixedLength(4);

// more code

for(final String token : FOUR_LETTERS.split("Thequickbrownfoxjumps")){
    System.out.println(token);
}

6 Comments

Thanks for the post(For making me aware of guava library method).But i'll have to accept the regex answer stackoverflow.com/questions/3760152/… since it doesn't require any 3rd party library and a one-liner.
Including hundreds of KB of library code just to perform this simple task is almost certainly not the right thing.
@JeffreyBlattman including Guava just for this is probably overkill, true. But I use it as a general-purpose library in all my Java code anyway, so why not use this one additional piece of functionality
any way to join back with a separator?
@AquariusPower String.join(separator, arrayOrCollection)
|
15

If you're using Google's guava general-purpose libraries (and quite honestly, any new Java project probably should be), this is insanely trivial with the Splitter class:

for (String substring : Splitter.fixedLength(4).split(inputString)) {
    doSomethingWith(substring);
}

and that's it. Easy as!

1 Comment

Does not work with all Unicode characters. Try Guava 30.1.1 with an input where we replace the q with FACE WITH MEDICAL MASK: "The😷uickbrownfoxjumps" yielding: The? ?uic kbro …
8
public static String[] split(String src, int len) {
    String[] result = new String[(int)Math.ceil((double)src.length()/(double)len)];
    for (int i=0; i<result.length; i++)
        result[i] = src.substring(i*len, Math.min(src.length(), (i+1)*len));
    return result;
}

2 Comments

Since src.length() and len are both ints, your call ceiling isn't accomplishing what you want--check out how some of the other responses are doing it: (src.length() + len - 1) / len
@Michael: Good point. I didn't test it with strings of non-multiple lengths. It's fixed now.
6
public String[] splitInParts(String s, int partLength)
{
    int len = s.length();

    // Number of parts
    int nparts = (len + partLength - 1) / partLength;
    String parts[] = new String[nparts];

    // Break into parts
    int offset= 0;
    int i = 0;
    while (i < nparts)
    {
        parts[i] = s.substring(offset, Math.min(offset + partLength, len));
        offset += partLength;
        i++;
    }

    return parts;
}

2 Comments

Out of interest, do you have something against for loops?
A for loop is indeed a more 'natural' choice use for this :-) Thanks for pointing this out.
5

Here's a one-liner version which uses Java 8 IntStream to determine the indexes of the slice beginnings:

String x = "Thequickbrownfoxjumps";

String[] result = IntStream
                    .iterate(0, i -> i + 4)
                    .limit((int) Math.ceil(x.length() / 4.0))
                    .mapToObj(i ->
                        x.substring(i, Math.min(i + 4, x.length())
                    )
                    .toArray(String[]::new);

Comments

3

I'd rather this simple solution:

String content = "Thequickbrownfoxjumps";
while(content.length() > 4) {
    System.out.println(content.substring(0, 4));
    content = content.substring(4);
}
System.out.println(content);

5 Comments

Don't do this! String is immutable so your code needs to copy the whole remaining string every 4 characters. Your snippet therefore takes quadratic rather than linear time in the size of the String.
@Tobias: Even if String was mutable, this snippet does the mentioned redundant copy, except there be complex compile processes concerning it. The only reason for using this snippet is code simplicity.
Did you change your code since you first posted it? The latest version doesn't actually make copies - substring() runs efficiently (constant time, at least on old versions of Java); it keeps a reference to the entire string's char[] (at least on old versions of Java), but that's fine in this case since you're keeping all the characters. So the latest code that you have here is actually okay (modulo that your code prints an empty line if content starts as the empty string, which may not be what one intends).
@Tobias: I don't remember any change.
@Tobias the substring implementation changed with Java 7, update 6 in the middle of 2012, when the offset and count fields were removed from the String class. So the complexity of substring turned to linear long before this answer was made. But for a small string like the example, it still runs fast enough and for longer strings…well this task rarely occurs in practice.
3

A StringBuilder version:

public static List<String> getChunks(String s, int chunkSize)
{
 List<String> chunks = new ArrayList<>();
 StringBuilder sb = new StringBuilder(s);

while(!(sb.length() ==0)) 
{           
   chunks.add(sb.substring(0, chunkSize));
   sb.delete(0, chunkSize);

}
return chunks;

}

1 Comment

Might be a late response, but this approach will throw StringIndexOutOfBoundsException, if 's' is less than chunkSize, StringBuilder.substring requires end to be less or equal to String size
2

i use the following java 8 solution:

public static List<String> splitString(final String string, final int chunkSize) {
  final int numberOfChunks = (string.length() + chunkSize - 1) / chunkSize;
  return IntStream.range(0, numberOfChunks)
                  .mapToObj(index -> string.substring(index * chunkSize, Math.min((index + 1) * chunkSize, string.length())))
                  .collect(toList());
}

Comments

2

Use code points to handle all characters

Here is a solution:

  • Works with all 143,859 Unicode characters
  • Allows you to examine or manipulate each resulting string, if you have further logic to process.

To work with all Unicode characters, avoid the obsolete char type. And avoid char-based utilities. Instead, use code point integer numbers.

Call String#codePoints to get an IntStream object, a stream of int values. In the code below, we collect those int values into an array. Then we loop the array, for each integer we append the character assigned to that number to our StringBuilder object. Every nth character, we add a string to our master list, and empty the StringBuilder.

String input = "Thequickbrownfoxjumps";

int chunkSize = 4 ;
int[] codePoints = input.codePoints().toArray();  // `String#codePoints` returns an `IntStream`. Collect the elements of that stream into an array.
int initialCapacity = ( ( codePoints.length / chunkSize ) + 1 );
List < String > strings = new ArrayList <>( initialCapacity );

StringBuilder sb = new StringBuilder();
for ( int i = 0 ; i < codePoints.length ; i++ )
{
    sb.appendCodePoint( codePoints[ i ] );
    if ( 0 == ( ( i + 1 ) % chunkSize ) ) // Every nth code point.
    {
        strings.add( sb.toString() ); // Remember this iteration's value.
        sb.setLength( 0 ); // Clear the contents of the `StringBuilder` object.
    }
}
if ( sb.length() > 0 ) // If partial string leftover, save it too. Or not… just delete this `if` block.
{
    strings.add( sb.toString() ); // Remember last iteration's value.
}

System.out.println( "strings = " + strings );

strings = [Theq, uick, brow, nfox, jump, s]

This works with non-Latin characters. Here we replace q with FACE WITH MEDICAL MASK.

String text = "The😷uickbrownfoxjumps"

strings = [The😷, uick, brow, nfox, jump, s]

Comments

1

You can use substring from String.class (handling exceptions) or from Apache lang commons (it handles exceptions for you)

static String   substring(String str, int start, int end) 

Put it inside a loop and you are good to go.

3 Comments

What's wrong with the substring method in the standard String class?
The commons version avoids exceptions (out of bounds and such)
I see; I would say I prefer to 'avoid exceptions' by controlling the parameters in the calling code instead.
1

In case you want to split the string equally backwards, i.e. from right to left, for example, to split 1010001111 to [10, 1000, 1111], here's the code:

/**
 * @param s         the string to be split
 * @param subLen    length of the equal-length substrings.
 * @param backwards true if the splitting is from right to left, false otherwise
 * @return an array of equal-length substrings
 * @throws ArithmeticException: / by zero when subLen == 0
 */
public static String[] split(String s, int subLen, boolean backwards) {
    assert s != null;
    int groups = s.length() % subLen == 0 ? s.length() / subLen : s.length() / subLen + 1;
    String[] strs = new String[groups];
    if (backwards) {
        for (int i = 0; i < groups; i++) {
            int beginIndex = s.length() - subLen * (i + 1);
            int endIndex = beginIndex + subLen;
            if (beginIndex < 0)
                beginIndex = 0;
            strs[groups - i - 1] = s.substring(beginIndex, endIndex);
        }
    } else {
        for (int i = 0; i < groups; i++) {
            int beginIndex = subLen * i;
            int endIndex = beginIndex + subLen;
            if (endIndex > s.length())
                endIndex = s.length();
            strs[i] = s.substring(beginIndex, endIndex);
        }
    }
    return strs;
}

Comments

1

Here is a one liner implementation using Java8 streams:

String input = "Thequickbrownfoxjumps";
final AtomicInteger atomicInteger = new AtomicInteger(0);
Collection<String> result = input.chars()
                                    .mapToObj(c -> String.valueOf((char)c) )
                                    .collect(Collectors.groupingBy(c -> atomicInteger.getAndIncrement() / 4
                                                                ,Collectors.joining()))
                                    .values();

It gives the following output:

[Theq, uick, brow, nfox, jump, s]

1 Comment

That’s a horrible solution, fighting the intend of the API, using stateful functions and being significantly more complicated than an ordinary loop, not to speak of the boxing and string concatenation overhead. If you want a Stream solution, use something like String[] result = IntStream.range(0, (input.length()+3)/4) .mapToObj(i -> input.substring(i *= 4, Math.min(i + 4, input.length()))) .toArray(String[]::new);
1

Java 8 solution (like this but a bit simpler):

public static List<String> partition(String string, int partSize) {
  List<String> parts = IntStream.range(0, string.length() / partSize)
    .mapToObj(i -> string.substring(i * partSize, (i + 1) * partSize))
    .collect(toList());
  if ((string.length() % partSize) != 0)
    parts.add(string.substring(string.length() / partSize * partSize));
  return parts;
}

Comments

0

Here is my version based on RegEx and Java 8 streams. It's worth to mention that Matcher.results() method is available since Java 9.

Test included.

public static List<String> splitString(String input, int splitSize) {
    Matcher matcher = Pattern.compile("(?:(.{" + splitSize + "}))+?").matcher(input);
    return matcher.results().map(MatchResult::group).collect(Collectors.toList());
}

@Test
public void shouldSplitStringToEqualLengthParts() {
    String anyValidString = "Split me equally!";
    String[] expectedTokens2 = {"Sp", "li", "t ", "me", " e", "qu", "al", "ly"};
    String[] expectedTokens3 = {"Spl", "it ", "me ", "equ", "all"};

    Assert.assertArrayEquals(expectedTokens2, splitString(anyValidString, 2).toArray());
    Assert.assertArrayEquals(expectedTokens3, splitString(anyValidString, 3).toArray());
}

Comments

0

The simplest solution is:

  /**
   * Slices string by passed - in slice length.
   * If passed - in string is null or slice length less then 0 throws IllegalArgumentException.
   * @param toSlice string to slice
   * @param sliceLength slice length
   * @return List of slices
   */
  public static List<String> stringSlicer(String toSlice, int sliceLength) {
    if (toSlice == null) {
      throw new IllegalArgumentException("Passed - in string is null");
    }
    if (sliceLength < 0) {
      throw new IllegalArgumentException("Slice length can not be less then 0");
    }
    if (toSlice.isEmpty() || toSlice.length() <= sliceLength) {
      return List.of(toSlice);
    }
    
   return Arrays.stream(toSlice.split(String.format("(?s)(?<=\\G.{%d})", sliceLength))).collect(Collectors.toList());
  }

Comments

0

My take on this:

String input = "Thequickbrownfoxjumps";
int SIZE = 4;
String result = IntStream.rangeClosed(0,input.length() / SIZE)
                    .map(i -> i*SIZE)
                    .boxed()
                    .map(idx->input.substring(idx,
                           Math.min(idx+SIZE,input.length())))
                    .collect(Collectors.joining(" "));

Comments

-1

I asked @Alan Moore in a comment to the accepted solution how strings with newlines could be handled. He suggested using DOTALL.

Using his suggestion I created a small sample of how that works:

public void regexDotAllExample() throws UnsupportedEncodingException {
    final String input = "The\nquick\nbrown\r\nfox\rjumps";
    final String regex = "(?<=\\G.{4})";

    Pattern splitByLengthPattern;
    String[] split;

    splitByLengthPattern = Pattern.compile(regex);
    split = splitByLengthPattern.split(input);
    System.out.println("---- Without DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is a single entry longer than the desired split size:
    ---- Without DOTALL ----
    [Idx: 0, length: 26] - [B@17cdc4a5
     */


    //DOTALL suggested in Alan Moores comment on SO: https://stackoverflow.com/a/3761521/1237974
    splitByLengthPattern = Pattern.compile(regex, Pattern.DOTALL);
    split = splitByLengthPattern.split(input);
    System.out.println("---- With DOTALL ----");
    for (int i = 0; i < split.length; i++) {
        byte[] s = split[i].getBytes("utf-8");
        System.out.println("[Idx: "+i+", length: "+s.length+"] - " + s);
    }
    /* Output is as desired 7 entries with each entry having a max length of 4:
    ---- With DOTALL ----
    [Idx: 0, length: 4] - [B@77b22abc
    [Idx: 1, length: 4] - [B@5213da08
    [Idx: 2, length: 4] - [B@154f6d51
    [Idx: 3, length: 4] - [B@1191ebc5
    [Idx: 4, length: 4] - [B@30ddb86
    [Idx: 5, length: 4] - [B@2c73bfb
    [Idx: 6, length: 2] - [B@6632dd29
     */

}

But I like @Jon Skeets solution in https://stackoverflow.com/a/3760193/1237974 also. For maintainability in larger projects where not everyone are equally experienced in Regular expressions I would probably use Jons solution.

Comments

-1

Another brute force solution could be,

    String input = "thequickbrownfoxjumps";
    int n = input.length()/4;
    String[] num = new String[n];

    for(int i = 0, x=0, y=4; i<n; i++){
    num[i]  = input.substring(x,y);
    x += 4;
    y += 4;
    System.out.println(num[i]);
    }

Where the code just steps through the string with substrings

Comments

-1
    import static java.lang.System.exit;
   import java.util.Scanner;
   import Java.util.Arrays.*;


 public class string123 {

public static void main(String[] args) {


  Scanner sc=new Scanner(System.in);
    System.out.println("Enter String");
    String r=sc.nextLine();
    String[] s=new String[10];
    int len=r.length();
       System.out.println("Enter length Of Sub-string");
    int l=sc.nextInt();
    int last;
    int f=0;
    for(int i=0;;i++){
        last=(f+l);
            if((last)>=len) last=len;
        s[i]=r.substring(f,last);
     // System.out.println(s[i]);

      if (last==len)break;
       f=(f+l);
    } 
    System.out.print(Arrays.tostring(s));
    }}

Result

 Enter String
 Thequickbrownfoxjumps
 Enter length Of Sub-string
 4

 ["Theq","uick","brow","nfox","jump","s"]

Comments

-1
@Test
public void regexSplit() {
    String source = "Thequickbrownfoxjumps";
    // define matcher, any char, min length 1, max length 4
    Matcher matcher = Pattern.compile(".{1,4}").matcher(source);
    List<String> result = new ArrayList<>();
    while (matcher.find()) {
        result.add(source.substring(matcher.start(), matcher.end()));
    }
    String[] expected = {"Theq", "uick", "brow", "nfox", "jump", "s"};
    assertArrayEquals(result.toArray(), expected);
}

Comments

-1
public static String[] split(String input, int length) throws IllegalArgumentException {

    if(length == 0 || input == null)
        return new String[0];

    int lengthD = length * 2;

    int size = input.length();
    if(size == 0)
        return new String[0];

    int rep = (int) Math.ceil(size * 1d / length);

    ByteArrayInputStream stream = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_16LE));

    String[] out = new String[rep];
    byte[]  buf = new byte[lengthD];

    int d = 0;
    for (int i = 0; i < rep; i++) {

        try {
            d = stream.read(buf);
        } catch (IOException e) {
            e.printStackTrace();
        }

        if(d != lengthD)
        {
            out[i] = new String(buf,0,d, StandardCharsets.UTF_16LE);
            continue;
        }

        out[i] = new String(buf, StandardCharsets.UTF_16LE);
    }
    return out;
}

Comments

-1
public static List<String> getSplittedString(String stringtoSplit,
            int length) {

        List<String> returnStringList = new ArrayList<String>(
                (stringtoSplit.length() + length - 1) / length);

        for (int start = 0; start < stringtoSplit.length(); start += length) {
            returnStringList.add(stringtoSplit.substring(start,
                    Math.min(stringtoSplit.length(), start + length)));
        }

        return returnStringList;
    }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.