6

This answer points out the fact that C++ is not well suited for the iteration over a binary file, but this is what I need right now, in short I need to operate on files in a "binary" way, yes all files are binary even the .txt ones, but I'm writing something that operates on image files, so I need to read files that are well structured, were the data is arranged in a specific way.

I would like to read the entire file in a data structure such as std::vector<T> so I can almost immediately close the file and work with the content in memory without caring about disk I/O anymore.

Right now, the best way to perform a complete iteration over a file according to the standard library is something along the lines of

std::ifstream ifs(filename, std::ios::binary);
  for (std::istreambuf_iterator<char, std::char_traits<char> > it(ifs.rdbuf());
       it != std::istreambuf_iterator<char, std::char_traits<char> >(); it++) {
    // do something with *it;
  }
ifs.close();

or use std::copy, but even with std::copy you are always using istreambuf iterators ( so if I understand the C++ documentation correctly, you are basically reading 1 byte at each call with the previous code ).

So the question is: how do I write a custom iterator ? from where I should inherit from ?

I assume that this is also important while writing a file to disk, and I assume that I could use the same iterator class for writing, if I'm wrong please feel free to correct me.

4
  • Is the size of the inbound data precluding you from just ifs.read-ing the data straight up into a std::vector<unsigned char> and iterating over that? Commented Nov 21, 2013 at 17:58
  • @WhozCraig for now I don't think that the file are too big to be kept in memory ( if this is what you are referring to ), I'm fine with read or any other way, even the constructor of the vector class supports iterators, so I'm fine on that side, the "problem" are the iterators themself, I would like to write one to try to browse the data differently. EDIT: I would like to avoid any C-ish way, I'll stick with the iterators. Commented Nov 21, 2013 at 18:07
  • 1
    you are basically reading 1 byte at each call -- from ifstream's in-memory buffer, not from the file itself. The actual read(2) calls are still for every 4k or 16k or whatever is the default buffer for you. Commented Nov 21, 2013 at 18:11
  • @Cubbi yes, I wasn't going to introduce the buffered/unbuffered behaviour because I want to keep the focus on the iterators, but you are right, anyway I'm also not interested on this because is something platform-specific and I'm also trying to adopt a solution that is cross-platform as much as possible, without introducing extra stuff. That's why I would like to re-write an iterator, looks like it's the perfect mix between abstraction from the file and portability. Commented Nov 21, 2013 at 18:16

2 Answers 2

2

It is possible to optimize std::copy() using std::istreambuf_iterator<char> but hardly any implementation does. Just deriving from something won't really do the trick either because that isn't how iterators work.

The most effective built-in approach is probably to simply dump the file into an std::ostringstream and the get a std::string from there:

std::ostringstream out;
out << file.rdbuf();
std::string content = out.str();

If you want to avoid travelling through a std::string you could write a stream buffer directly dumping the content into a memory area or a std::vector<unsigned char> and also using the output operation above.

The std::istreambuf_iterator<char>s could, in principle have a backdoor to the stream buffer's and bypass characterwise operations. Without that backdoor you won't be able to speed up anything using these iterators. You could create an iterator on top of stream buffers using the stream buffer's sgetn() to deal with a similar buffer. In that case you'd pretty much need a version of std::copy() dealing with segments (i.e., each fill of a buffer) efficiently. Short of either I'd just read the file into buffer using a stream buffer and iterate over that.

Sign up to request clarification or add additional context in comments.

1 Comment

so you are suggesting to basically stick with my first implementation ? What are the possible errors ? What happens if the file is corrupted ?
2

My suggestion is not to use a custom stream, stream-buffer or stream-iterator.

#include <fstream>

struct Data {
    short a;
    short b;
    int   c;
};

std::istream& operator >> (std::istream& stream, Data& data) {
    static_assert(sizeof(Data) == 2*sizeof(short) + sizeof(int), "Invalid Alignment");
    if(stream.read(reinterpret_cast<char*>(&data), sizeof(Data))) {
        // Consider endian
    }
    else {
        // Error
    }
    return stream;
}

int main(int argc, char* argv[])
{
    std::ifstream stream;
    Data data;
    while(stream >> data) {
        // Process
    }
    if(stream.fail()) {
        // Error (EOF is good)
    }
    return 0;
}

You could dare to make a stream buffer iterator reading elements having a bigger size than the underlaying char_type:

  • What if the data has an invalid format ?
  • What if the data is incomplete and at EOF ?

The state of the stream is not maintained by the buffer or iterator.

2 Comments

I can buffer the entire file ?
@user2485710 That would depend on the underlaying stream buffer (hence it is possible)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.