2

I have an array of bytes and i want to determine if the contents of this array of bytes exists within another larger array as a continuous sequence. What is the simplest way to go about doing this?

3 Answers 3

3

The naive approach is:

public static bool IsSubsetOf(byte[] set, byte[] subset) {
    for(int i = 0; i < set.Length && i + subset.Length <= set.Length; ++i)
        if (set.Skip(i).Take(subset.Length).SequenceEqual(subset))
            return true;
    return false;
}

For more efficient approaches, you might consider more advanced string matching algorithms like KMP.

Sign up to request clarification or add additional context in comments.

4 Comments

@Michal: It's O(n*m) which might be considered inefficient, but if you're worrying about Skip and Take, don't. This is as efficient as a couple for loops. As I said, if performance is an issue, you should consider a more advanced algorithm.
@Mehrdad - yeah, but mn can get quite big for binary data... I know that Skip() and Take() is done by deferred execution - so it's just n component of O(mn) coming from SequenceEquals...
Performance will be a problem but for now i just want to get my code working. this ought to do nicely...
Skip() can't actually do an indexed jump. It must call MoveNext() on the iterator i times. See msmvps.com/blogs/jon_skeet/archive/2011/01/26/… for why. But with the "if you are worried about perf... do something else" caveat, this answer is sound.
3

Try to adapt some string search algorithm. One of the fastest is Boyer-Moore . It's quite easy as well. For binary data, Knuth-Morris-Pratt algorithm might work very efficiently as well.

Comments

0

This, which is a 1/1 port of this answer: Searching for a sequence of Bytes in a Binary File with Java

Is a very efficient way of doing so:

public static class KmpSearch {

    public static int IndexOf(byte[] data, byte[] pattern) {
        int[] failure = ComputeFailure(pattern);

        int j = 0;
        if (data.Length == 0) return -1;

        for (int i = 0; i < data.Length; i++) {
            while (j > 0 && pattern[j] != data[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == data[i]) { j++; }
            if (j == pattern.Length) {
                return i - pattern.Length + 1;
            }
        }
        return -1;
    }


    private static int[] ComputeFailure(byte[] pattern) {
        int[] failure = new int[pattern.Length];

        int j = 0;
        for (int i = 1; i < pattern.Length; i++) {
            while (j > 0 && pattern[j] != pattern[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == pattern[i]) {
                j++;
            }
            failure[i] = j;
        }

        return failure;
    }
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.