11

I need to read a large binary file (~1GB) into a std::vector<double>. I'm currently using infile.read to copy the whole thing into a char * buffer (shown below) and I currently plan to convert the whole thing into doubles with reinterpret_cast. surely there must be a way to just put the doubles straight into the vector?

I'm also not sure about the format of the binary file, the data was produced in python so it's probably all floats

ifstream infile(filename, std--ifstream--binary);

infile.seekg(0, infile.end);     //N is the total number of doubles
N = infile.tellg();              
infile.seekg(0, infile.beg);

char * buffer = new char[N];

infile.read(buffer, N);
5
  • Is there a reason you want to use doubles? Binary data is normally represented as a char since it, on most platforms, occupies a single byte. Commented Feb 24, 2015 at 22:52
  • 4
    If you don't know the format of the file how did you plan to convert it? Commented Feb 24, 2015 at 22:53
  • 1
    Ummmmm.... you want to read a binary file not knowing the format as something else than just a stream of bytes? Commented Feb 24, 2015 at 22:54
  • 2
    Map the file into memory, construct the vector from that. Commented Feb 24, 2015 at 22:56
  • Nobody has said anything about endianness yet. Maybe portability doesn't matter in this case. Also, asking the OS for 1GB of contiguous data is not generally a great idea. Consider whether a container like std::deque would suit your requirements. Commented Feb 25, 2015 at 0:27

1 Answer 1

12

Assuming the entire file is double, otherwise this wont work properly.

std::vector<double> buf(N / sizeof(double));// reserve space for N/8 doubles
infile.read(reinterpret_cast<char*>(buf.data()), buf.size()*sizeof(double)); // or &buf[0] for C++98
Sign up to request clarification or add additional context in comments.

13 Comments

The declaration zero-initialises, which seems a bit wasteful.
@AlanStokes On the other hand, the IO operation itself probably is the bottleneck. I would say this is fine until measurement proofs this to be significant.
You certainly do not want to (and most certainly cannot) save 1GB of data in an std::array.
@BaummitAugen Curious, why isn't std::array suitable for large data? Is the the storage on the stack?
@AlanStokes @TonyJiang Because std::array is (by design) a zero overhead wrapper for C-arrays (the int arr[100]; kind). Those end up on the stack (unless it has static storage duration or allocated dynamically, but don't do the latter in this case).
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.