2

I need to search for a binary pattern in binary file, how can i do it?

I tried with "strstr()" function and convert the file and the pattern to a string but its not working.

(the pattern is also a binary file) this is what it tried:

void isinfected(FILE *file, FILE *sign, char filename[], char filepath[])
{
char* fil,* vir;
int filelen, signlen;
fseek(file, 0, SEEK_END);
fseek(sign, 0, SEEK_END);
filelen = ftell(file);
signlen = ftell(sign);

fil = (char *)malloc(sizeof(char) * filelen);
if (!fil)
{
    printf("unseccesful malloc!\n");
}

vir = (char *)malloc(sizeof(char) * signlen);

if (!vir)
{
    printf("unseccesful malloc!\n");
}

fseek(file, 0, SEEK_CUR);
fseek(sign, 0, SEEK_CUR);

fread(fil, 1, filelen, file);
fread(vir, 1, signlen, sign);
if (strstr(vir, fil) != NULL)
    log(filename, "infected",filepath );
else
    log(filename, "not infected", filepath);
free(vir);
free(fil);
}
3
  • assuming that the pattern fits in memory, store it as a string, and have also another (FIFO) buffer of the same size, to check if they are equal. Commented Jun 6, 2015 at 14:11
  • Please show what you have tried, and explain in what way(s) "its not working": what did you expect, and what happens instead? Commented Jun 6, 2015 at 14:13
  • Note: "convert the file and the pattern to a string" is not what you did with the cast char *. Creating a pointer to a char does not by definition make it a valid C string. This would only work if both your fil and vir data were valid, zero terminated C strings. Commented Jun 6, 2015 at 15:57

2 Answers 2

2

For any binary handling you should never use one of the strXX functions, because these only (and exclusively) work on C-style zero terminated strings. Your code is failing because the strXX functions cannot look beyond the first binary 0 they encounter.

As your basic idea with strstr appears correct (and only fails because it works on zero terminated strings only), you can replace it with memmem, which does the same on arbitrary data. Since memmem is a GNU C extension (see also Is there a particular reason for memmem being a GNU extension?), it may not be available on your system and you need to write code that does the same thing.

For a very basic implementation of memmem you can use memchr to scan for the first binary character, followed by memcmp if it found something:

void * my_memmem(const void *big, size_t big_len, const void *little, size_t little_len)
{
    void *iterator;
    if (big_len < little_len)
        return NULL;

    iterator = (void *)big;
    while (1)
    {
        iterator = memchr (iterator, ((unsigned char *)little)[0], big_len - (iterator-big));
        if (iterator == NULL)
            return NULL;
        if (iterator && !memcmp (iterator, little, little_len))
            return iterator;
        iterator++;
    }
}

There are better implementations possible, but unless memmem is an important function in your program, it'll do the job just fine.

Sign up to request clarification or add additional context in comments.

3 Comments

The conclusion the strstr doesn't work because this is a binary file is probably correct. memmem, however, is a GNU C extension, not a standard library call or a function available in the Windows API.
you could always find the source code to memmem (hint it is here: opensource.apple.com/source/Libc/Libc-825.40.1/string/FreeBSD/…) and add it, with proper attribution, to your project. You may need to 'tweak' it a bit to get it to compile on Windows, but looking at the code it should not be to difficult
@thurizas: ... I wrote a test program using memmem to verify against, and then created the above function – with a footnote because, boy, this is ugly. There is definitely room for improvement. Only after that I checked your link. ... *ouch*! Note To Self: If I ever need an efficient memmem, write it myself!
1

The basic idea is to check if vir matches the beginning of fil. If it doesn't, then you check again, starting at the second byte of fil, and repeating until you find a match or until you've reached the end of fil. (This is essentially what a simple implementation of strstr does, except that strstr treats 0 bytes as a special case.)

int i;
for (i = 0; i < filelen - signlen; ++i) {
  if (memcmp(vir, fil + i, signlen) == 0) {
    return true;   // vir exists in fil found
  }
}
return false;  // vir is not in file

This is the "brute force" approach. It can get very slow if your files are long. There are advanced searching algorithms that can potentially make this much faster, but this is a good starting point.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.