C++ binary files and iterators: getting away with a 1:1 using ifstreambuf_iterator?

Question

This answer points out the fact that C++ is not well suited for the iteration over a binary file, but this is what I need right now, in short I need to operate on files in a "binary" way, yes all files are binary even the .txt ones, but I'm writing something that operates on image files, so I need to read files that are well structured, were the data is arranged in a specific way.

I would like to read the entire file in a data structure such as std::vector<T> so I can almost immediately close the file and work with the content in memory without caring about disk I/O anymore.

Right now, the best way to perform a complete iteration over a file according to the standard library is something along the lines of

std::ifstream ifs(filename, std::ios::binary);
  for (std::istreambuf_iterator<char, std::char_traits<char> > it(ifs.rdbuf());
       it != std::istreambuf_iterator<char, std::char_traits<char> >(); it++) {
    // do something with *it;
  }
ifs.close();

or use std::copy, but even with std::copy you are always using istreambuf iterators ( so if I understand the C++ documentation correctly, you are basically reading 1 byte at each call with the previous code ).

So the question is: how do I write a custom iterator ? from where I should inherit from ?

I assume that this is also important while writing a file to disk, and I assume that I could use the same iterator class for writing, if I'm wrong please feel free to correct me.

Is the size of the inbound data precluding you from just ifs.read-ing the data straight up into a std::vector<unsigned char> and iterating over that? — WhozCraig
– WhozCraig, Commented Nov 21, 2013 at 17:58
@WhozCraig for now I don't think that the file are too big to be kept in memory ( if this is what you are referring to ), I'm fine with read or any other way, even the constructor of the vector class supports iterators, so I'm fine on that side, the "problem" are the iterators themself, I would like to write one to try to browse the data differently. EDIT: I would like to avoid any C-ish way, I'll stick with the iterators. — user2485710
– user2485710, Commented Nov 21, 2013 at 18:07
you are basically reading 1 byte at each call -- from ifstream's in-memory buffer, not from the file itself. The actual read(2) calls are still for every 4k or 16k or whatever is the default buffer for you. — Cubbi
– Cubbi, Commented Nov 21, 2013 at 18:11
@Cubbi yes, I wasn't going to introduce the buffered/unbuffered behaviour because I want to keep the focus on the iterators, but you are right, anyway I'm also not interested on this because is something platform-specific and I'm also trying to adopt a solution that is cross-platform as much as possible, without introducing extra stuff. That's why I would like to re-write an iterator, looks like it's the perfect mix between abstraction from the file and portability. — user2485710
– user2485710, Commented Nov 21, 2013 at 18:16

Dietmar Kühl · Accepted Answer · 2013-11-21 18:43:10Z

2

It is possible to optimize std::copy() using std::istreambuf_iterator<char> but hardly any implementation does. Just deriving from something won't really do the trick either because that isn't how iterators work.

The most effective built-in approach is probably to simply dump the file into an std::ostringstream and the get a std::string from there:

std::ostringstream out;
out << file.rdbuf();
std::string content = out.str();

If you want to avoid travelling through a std::string you could write a stream buffer directly dumping the content into a memory area or a std::vector<unsigned char> and also using the output operation above.

The std::istreambuf_iterator<char>s could, in principle have a backdoor to the stream buffer's and bypass characterwise operations. Without that backdoor you won't be able to speed up anything using these iterators. You could create an iterator on top of stream buffers using the stream buffer's sgetn() to deal with a similar buffer. In that case you'd pretty much need a version of std::copy() dealing with segments (i.e., each fill of a buffer) efficiently. Short of either I'd just read the file into buffer using a stream buffer and iterate over that.

edited Nov 21, 2013 at 18:43

answered Nov 21, 2013 at 18:25

Dietmar Kühl

155k18 gold badges238 silver badges395 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user2485710 Over a year ago

so you are suggesting to basically stick with my first implementation ? What are the possible errors ? What happens if the file is corrupted ?

user2249683 · Accepted Answer · 2013-11-21 20:12:51Z

2

My suggestion is not to use a custom stream, stream-buffer or stream-iterator.

#include <fstream>

struct Data {
    short a;
    short b;
    int   c;
};

std::istream& operator >> (std::istream& stream, Data& data) {
    static_assert(sizeof(Data) == 2*sizeof(short) + sizeof(int), "Invalid Alignment");
    if(stream.read(reinterpret_cast<char*>(&data), sizeof(Data))) {
        // Consider endian
    }
    else {
        // Error
    }
    return stream;
}

int main(int argc, char* argv[])
{
    std::ifstream stream;
    Data data;
    while(stream >> data) {
        // Process
    }
    if(stream.fail()) {
        // Error (EOF is good)
    }
    return 0;
}

You could dare to make a stream buffer iterator reading elements having a bigger size than the underlaying char_type:

What if the data has an invalid format ?
What if the data is incomplete and at EOF ?

The state of the stream is not maintained by the buffer or iterator.

answered Nov 21, 2013 at 20:12

user2249683

2 Comments

user2485710 Over a year ago

I can buffer the entire file ?

user2249683 Over a year ago

@user2485710 That would depend on the underlaying stream buffer (hence it is possible)

Collectives™ on Stack Overflow

C++ binary files and iterators: getting away with a 1:1 using ifstreambuf_iterator?

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related