1

I have asked this question a few days ago:

How to look for an ANSI string in a binary file?

and I got a really nice answer, what later turned into a much harder question: Can input iterators be used where forward iterators are expected? what is now really not on a level what I could understand.

I am still learning C++ and I am looking for an easy way to search for a string in a binary file.

Could someone show me a simple code for a minimalistic C++ console program which looks for a string in a binary file and outputs the locations to stdout?

Possibly, can you show me

  1. a version where the file is being copied to memory (supposing the binary file is small)

  2. and an other one which uses the proper way from the linked questions

Sorry if it sounds like I'm asking for someone's code, but I am just learning C++ and I think maybe others could benefit from this question if someone could post some high quality code what is nice to learn from.

3

2 Answers 2

2

Your requirement specification is unclear, for example - where does "121" appear in "12121"... just at the first character (after which searching continues at the 4th), or at the 3rd as well? The code below uses the former approach.

#include <iostream>
#include <fstream>
#include <string>
#include <string.h>

int main(int argc, const char* argv[])
{
    if (argc != 3)
    {
        std::cerr << "Usage: " << argv[0] << " filename search_term\n"
            "Prints offsets where search_term is found in file.\n";
        return 1;
    }

    const char* filename = argv[1];
    const char* search_term = argv[2];
    size_t search_term_size = strlen(search_term);

    std::ifstream file(filename, std::ios::binary);
    if (file)
    {
        file.seekg(0, std::ios::end);
        size_t file_size = file.tellg();
        file.seekg(0, std::ios::beg);
        std::string file_content;
        file_content.reserve(file_size);
        char buffer[16384];
        std::streamsize chars_read;

        while (file.read(buffer, sizeof buffer), chars_read = file.gcount())
            file_content.append(buffer, chars_read);

        if (file.eof())
        {
            for (std::string::size_type offset = 0, found_at;
                 file_size > offset &&
                 (found_at = file_content.find(search_term, offset)) !=
                                                            std::string::npos;
                 offset = found_at + search_term_size)
                std::cout << found_at << std::endl;
        }
    }
}
Sign up to request clarification or add additional context in comments.

4 Comments

@ildjarn: true (but hey, it still runs more than twice as fast as your non-boost solution in my benchmarks ;-P)
Fair enough, I benchmarked and verified your results; I didn't expect copying from an istreambuf_iterator pair to be so slow. :-[
@ildjarn: what happened with your code? Even if its not the fastest solution it might be a really good reference to have it here! I was planning on learning from all 4 solutions.
@ildjarn: zsero's right... you have good solutions to list... it might be something simple like not using reserve on the deque - I didn't have time to investigate - but that's not the point anyway: it might run faster in someone else's/future library implementation etc....
2

This is one way to do part 1. Not sure I would I describe it as high quality but maybe on the minimalist side.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int main(int argc, char *argv[])
{
    std::ifstream ifs(argv[1], ios::binary);

    std::string str((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());

    size_t pos = str.find(argv[2]);

    if (pos != string::npos)
        cout << "string found at position: " << int(pos) << endl;
    else
        cout << "could not find string" << endl;

    return 0;
}

2 Comments

Thx, works perfectly and it is really nice to read from! But my problem is that std::string str (std::istreambuf_iterator, std::istreambuf_iterator) is extremely slow. While the actual search takes almost no time to find the result. Is there any way to do the string creation faster?
@zsero - The iterators are slow. Faster ways are to (1) read buffers of data and search as you go along rather than reading the whole file, all of which may not be necessary, into memory; (2) drop down to more OS-specific things like memory-mapping or using OS-hint like posix_fadvise. Simply using a good buffer size and fstream.read() will be faster than this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.