I have an array of bytes and i want to determine if the contents of this array of bytes exists within another larger array as a continuous sequence. What is the simplest way to go about doing this?
3 Answers
The naive approach is:
public static bool IsSubsetOf(byte[] set, byte[] subset) {
for(int i = 0; i < set.Length && i + subset.Length <= set.Length; ++i)
if (set.Skip(i).Take(subset.Length).SequenceEqual(subset))
return true;
return false;
}
For more efficient approaches, you might consider more advanced string matching algorithms like KMP.
4 Comments
Mehrdad Afshari
@Michal: It's O(n*m) which might be considered inefficient, but if you're worrying about Skip and Take, don't. This is as efficient as a couple for loops. As I said, if performance is an issue, you should consider a more advanced algorithm.
Michał Chaniewski
@Mehrdad - yeah, but mn can get quite big for binary data... I know that Skip() and Take() is done by deferred execution - so it's just n component of O(mn) coming from SequenceEquals...
RCIX
Performance will be a problem but for now i just want to get my code working. this ought to do nicely...
Merlyn Morgan-Graham
Skip() can't actually do an indexed jump. It must call MoveNext() on the iterator i times. See msmvps.com/blogs/jon_skeet/archive/2011/01/26/… for why. But with the "if you are worried about perf... do something else" caveat, this answer is sound.Try to adapt some string search algorithm. One of the fastest is Boyer-Moore . It's quite easy as well. For binary data, Knuth-Morris-Pratt algorithm might work very efficiently as well.
Comments
This, which is a 1/1 port of this answer: Searching for a sequence of Bytes in a Binary File with Java
Is a very efficient way of doing so:
public static class KmpSearch {
public static int IndexOf(byte[] data, byte[] pattern) {
int[] failure = ComputeFailure(pattern);
int j = 0;
if (data.Length == 0) return -1;
for (int i = 0; i < data.Length; i++) {
while (j > 0 && pattern[j] != data[i]) {
j = failure[j - 1];
}
if (pattern[j] == data[i]) { j++; }
if (j == pattern.Length) {
return i - pattern.Length + 1;
}
}
return -1;
}
private static int[] ComputeFailure(byte[] pattern) {
int[] failure = new int[pattern.Length];
int j = 0;
for (int i = 1; i < pattern.Length; i++) {
while (j > 0 && pattern[j] != pattern[i]) {
j = failure[j - 1];
}
if (pattern[j] == pattern[i]) {
j++;
}
failure[i] = j;
}
return failure;
}
}