12

How to split a byte[] around a byte sequence in Java? Something like the byte[] version of String#split(regex).

Example

Let's take this byte array:
[11 11 FF FF 22 22 22 FF FF 33 33 33 33]

and let's choose the delimiter to be
[FF FF]

Then the split will result in these three parts:
[11 11]
[22 22 22]
[33 33 33 33]

EDIT:

Please note that you cannot convert the byte[] to String, then split it, then back because of encoding issues. When you do such conversion on byte arrays, the resulting byte[] will be different. Please refer to this: Conversion of byte[] into a String and then back to a byte[]

6
  • No, it's not. Please read more carefully. Commented Mar 19, 2014 at 22:30
  • Iterate over the array; compare the next delimiter.length bytes to the delimiter, and split as needed? What exactly are you having trouble with? Commented Mar 19, 2014 at 22:42
  • Yes, I can do this, but I'm looking for an existing solution, not reinventing the wheel. It's a good practice to reuse existing, proven, tested code than writing on your own. Commented Mar 19, 2014 at 22:44
  • Is encoding an issue because you're dealing with guaranteed non-textual data or is this an artificial constraint? If you know what the encoding is going to be, it stops being a problem. Commented Mar 19, 2014 at 23:02
  • 1
    Possible duplicate of stackoverflow.com/questions/1387027/… Commented Mar 19, 2014 at 23:03

7 Answers 7

11

Here is a straightforward solution.

Unlike avgvstvs approach it handles arbitrary length delimiters. The top answer is also good, but the author hasn't fixed the issue pointed out by Eitan Perkal. That issue is avoided here using the approach Perkal suggests.

public static List<byte[]> tokens(byte[] array, byte[] delimiter) {
        List<byte[]> byteArrays = new LinkedList<>();
        if (delimiter.length == 0) {
            return byteArrays;
        }
        int begin = 0;

        outer:
        for (int i = 0; i < array.length - delimiter.length + 1; i++) {
            for (int j = 0; j < delimiter.length; j++) {
                if (array[i + j] != delimiter[j]) {
                    continue outer;
                }
            }
            byteArrays.add(Arrays.copyOfRange(array, begin, i));
            begin = i + delimiter.length;
        }
        byteArrays.add(Arrays.copyOfRange(array, begin, array.length));
        return byteArrays;
    }
Sign up to request clarification or add additional context in comments.

Comments

8

Note that you can reliably convert from byte[] to String and back, with a one-to-one mapping of chars to bytes, if you use the encoding "iso8859-1".

However, it's still an ugly solution.

I think you'll need to roll your own.

I suggest solving it in two stages:

  1. Work out how to find the of indexes of each occurrence of the separator. Google for "Knuth-Morris-Pratt" for an efficient algorithm - although a more naive algorithm will be fine for short delimiters.
  2. Each time you find an index, use Arrays.copyOfRange() to get the piece you need and add it to your output list.

Here it is using a naive pattern finding algorithm. KMP would become worth it if the delimiters are long (because it saves backtracking, but doesn't miss delimiters if they're embedded in sequence that mismatches at the end).

public static boolean isMatch(byte[] pattern, byte[] input, int pos) {
    for(int i=0; i< pattern.length; i++) {
        if(pattern[i] != input[pos+i]) {
            return false;
        }
    }
    return true;
}

public static List<byte[]> split(byte[] pattern, byte[] input) {
    List<byte[]> l = new LinkedList<byte[]>();
    int blockStart = 0;
    for(int i=0; i<input.length; i++) {
       if(isMatch(pattern,input,i)) {
          l.add(Arrays.copyOfRange(input, blockStart, i));
          blockStart = i+pattern.length;
          i = blockStart;
       }
    }
    l.add(Arrays.copyOfRange(input, blockStart, input.length ));
    return l;
}

3 Comments

It's always good to read the book C Programming Language where it has ton of exercises that force you to come up with these kind of solutions. Then you can move to Java with that toolset under your belt.
The above code will fail if the input end with the start of the pattern (java.lang.ArrayIndexOutOfBoundsException), for example: byte[] pattern= { (byte) 0x43, (byte) 0x23}; byte[] input = { (byte) 0x08, (byte) 0x01, (byte) 0x53, (byte) 0x43}; - one simple solution is to change the split method in: for(int i=0; i<input.length; i++) { with: for(int i=0; i<input.length-pattern.length; i++) {
the line i = blockStart; also is incorrect, since i++ is executed afterwards. The Problem will occur with patterns of length 1.
4

I modified 'L. Blanc' answer to handle delimiters at the very beginning and at the very end. Plus I renamed it to 'split'.

private List<byte[]> split(byte[] array, byte[] delimiter)
{
   List<byte[]> byteArrays = new LinkedList<byte[]>();
   if (delimiter.length == 0)
   {
      return byteArrays;
   }
   int begin = 0;

   outer: for (int i = 0; i < array.length - delimiter.length + 1; i++)
   {
      for (int j = 0; j < delimiter.length; j++)
      {
         if (array[i + j] != delimiter[j])
         {
            continue outer;
         }
      }

      // If delimiter is at the beginning then there will not be any data.
      if (begin != i)
         byteArrays.add(Arrays.copyOfRange(array, begin, i));
      begin = i + delimiter.length;
   }

   // delimiter at the very end with no data following?
   if (begin != array.length)
      byteArrays.add(Arrays.copyOfRange(array, begin, array.length));

   return byteArrays;
}

1 Comment

Nice, although throws if there are two delimiters next to each other.
0

Rolling your own is the only way to go here. The best idea I can offer if you're open to non-standard libraries is this class from Apache:

http://commons.apache.org/proper/commons-primitives/apidocs/org/apache/commons/collections/primitives/ArrayByteList.html

Knuth's solution is probably the best, but I would treat the array as a stack and do something like this:

List<ArrayByteList> targetList = new ArrayList<ArrayByteList>();
while(!stack.empty()){
  byte top = stack.pop();
  ArrayByteList tmp = new ArrayByteList();

  if( top == 0xff && stack.peek() == 0xff){
    stack.pop();
    continue;
  }else{
    while( top != 0xff ){
      tmp.add(stack.pop());
    }
    targetList.add(tmp);
  }
}

I'm aware that this is pretty quick and dirty but it should deliver O(n) in all cases.

1 Comment

Fine for a simple two-byte delimiter but doesn't address more complex patterns -- which might be OK for the OP.
0

It's some improvement to the answer https://stackoverflow.com/a/44468124/1291605 of Roger: let's imagine that we have such array ||||aaa||bbb and delimiter ||. In this case we get

java.lang.IllegalArgumentException: 2 > 1
    at java.util.Arrays.copyOfRange(Arrays.java:3519)

So the final improved solution:

public static List<byte[]> split(byte[] array, byte[] delimiter) {
        List<byte[]> byteArrays = new LinkedList<>();
        if (delimiter.length == 0) {
            return byteArrays;
        }
        int begin = 0;

        outer:
        for (int i = 0; i < array.length - delimiter.length + 1; i++) {
            for (int j = 0; j < delimiter.length; j++) {
                if (array[i + j] != delimiter[j]) {
                    continue outer;
                }
            }

            // This condition was changed
            if (begin != i)
                byteArrays.add(Arrays.copyOfRange(array, begin, i));
            begin = i + delimiter.length;
        }

        // Also here we may change condition to 'less'
        if (begin < array.length)
            byteArrays.add(Arrays.copyOfRange(array, begin, array.length));

        return byteArrays;
    }

Comments

-3

You can use Arrays.copyOfRange() for that.

1 Comment

Arrays.copyOfRange .. can copy .. it cannot split byte arrays. The question is about splitting not copying
-4

Refer to Java Doc for String

You can construct a String object from byte array. Guess you know the rest.

public static byte[][] splitByteArray(byte[] bytes, byte[] regex, Charset charset) {
    String str = new String(bytes, charset);
    String[] split = str.split(new String(regex, charset));
    byte[][] byteSplit = new byte[split.length][];
    for (int i = 0; i < split.length; i++) {
        byteSplit[i] = split[i].getBytes(charset);
    }
    return byteSplit;
}

public static void main(String[] args) {
    Charset charset = Charset.forName("UTF-8");
    byte[] bytes = {
        '1', '1', ' ', '1', '1',
        'F', 'F', ' ', 'F', 'F',
        '2', '2', ' ', '2', '2', ' ', '2', '2',
        'F', 'F', ' ', 'F', 'F',
        '3', '3', ' ', '3', '3', ' ', '3', '3', ' ', '3', '3'
    };
    byte[] regex = {'F', 'F', ' ', 'F', 'F'};
    byte[][] splitted = splitByteArray(bytes, regex, charset);
    for (byte[] arr : splitted) {
        System.out.print("[");
        for (byte b : arr) {
            System.out.print((char) b);
        }
        System.out.println("]");
    }
}

2 Comments

I would recommend writing a little sample code so that users with the same problem can find the answer more easily. Because they might not "know the rest". Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.