Code for searching for a string in a binary file

Question

I have asked this question a few days ago:

How to look for an ANSI string in a binary file?

and I got a really nice answer, what later turned into a much harder question: Can input iterators be used where forward iterators are expected? what is now really not on a level what I could understand.

I am still learning C++ and I am looking for an easy way to search for a string in a binary file.

Could someone show me a simple code for a minimalistic C++ console program which looks for a string in a binary file and outputs the locations to stdout?

Possibly, can you show me

a version where the file is being copied to memory (supposing the binary file is small)
and an other one which uses the proper way from the linked questions

Sorry if it sounds like I'm asking for someone's code, but I am just learning C++ and I think maybe others could benefit from this question if someone could post some high quality code what is nice to learn from.

The byte of the first character, counted from the start of the file. I mean tellg() — hyperknot
– hyperknot, Commented Jun 27, 2011 at 0:44
Boyer-Moore algorithm (also see portal.acm.org/citation.cfm?doid=360825.360855) — Cheers and hth. - Alf
– Cheers and hth. - Alf, Commented Jun 27, 2011 at 1:19

Tony Delroy · Accepted Answer · 2011-06-27 02:59:48Z

2

Your requirement specification is unclear, for example - where does "121" appear in "12121"... just at the first character (after which searching continues at the 4th), or at the 3rd as well? The code below uses the former approach.

#include <iostream>
#include <fstream>
#include <string>
#include <string.h>

int main(int argc, const char* argv[])
{
    if (argc != 3)
    {
        std::cerr << "Usage: " << argv[0] << " filename search_term\n"
            "Prints offsets where search_term is found in file.\n";
        return 1;
    }

    const char* filename = argv[1];
    const char* search_term = argv[2];
    size_t search_term_size = strlen(search_term);

    std::ifstream file(filename, std::ios::binary);
    if (file)
    {
        file.seekg(0, std::ios::end);
        size_t file_size = file.tellg();
        file.seekg(0, std::ios::beg);
        std::string file_content;
        file_content.reserve(file_size);
        char buffer[16384];
        std::streamsize chars_read;

        while (file.read(buffer, sizeof buffer), chars_read = file.gcount())
            file_content.append(buffer, chars_read);

        if (file.eof())
        {
            for (std::string::size_type offset = 0, found_at;
                 file_size > offset &&
                 (found_at = file_content.find(search_term, offset)) !=
                                                            std::string::npos;
                 offset = found_at + search_term_size)
                std::cout << found_at << std::endl;
        }
    }
}

edited Jun 27, 2011 at 2:59

answered Jun 27, 2011 at 1:40

Tony Delroy

107k16 gold badges188 silver badges265 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Tony Delroy Over a year ago

@ildjarn: true (but hey, it still runs more than twice as fast as your non-boost solution in my benchmarks ;-P)

ildjarn Over a year ago

Fair enough, I benchmarked and verified your results; I didn't expect copying from an istreambuf_iterator pair to be so slow. :-[

hyperknot Over a year ago

@ildjarn: what happened with your code? Even if its not the fastest solution it might be a really good reference to have it here! I was planning on learning from all 4 solutions.

Tony Delroy Over a year ago

@ildjarn: zsero's right... you have good solutions to list... it might be something simple like not using reserve on the deque - I didn't have time to investigate - but that's not the point anyway: it might run faster in someone else's/future library implementation etc....

Duck · Accepted Answer · 2011-06-27 01:56:51Z

2

This is one way to do part 1. Not sure I would I describe it as high quality but maybe on the minimalist side.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
    std::ifstream ifs(argv[1], ios::binary);

    std::string str((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());

    size_t pos = str.find(argv[2]);

    if (pos != string::npos)
        cout << "string found at position: " << int(pos) << endl;
    else
        cout << "could not find string" << endl;

    return 0;
}

edited Jun 27, 2011 at 1:56

answered Jun 27, 2011 at 1:48

Duck

27.7k5 gold badges67 silver badges94 bronze badges

2 Comments

hyperknot Over a year ago

Thx, works perfectly and it is really nice to read from! But my problem is that std::string str (std::istreambuf_iterator, std::istreambuf_iterator) is extremely slow. While the actual search takes almost no time to find the result. Is there any way to do the string creation faster?

Duck Over a year ago

@zsero - The iterators are slow. Faster ways are to (1) read buffers of data and search as you go along rather than reading the whole file, all of which may not be necessary, into memory; (2) drop down to more OS-specific things like memory-mapping or using OS-hint like posix_fadvise. Simply using a good buffer size and fstream.read() will be faster than this.

Collectives™ on Stack Overflow

Code for searching for a string in a binary file

2 Answers 2

4 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related