Optimize File Read and Write

Question

I have this code to read 64MB of binary data into memory:


#define SIZE 8192
char* readFromFile(FILE* fp)
{
  char* memBlk = new char[SIZE*SIZE];
  fread(memBlk, 1, SIZE*SIZE, fp);
  return memBlk;
}

int main()
{
  FILE* fp = fopen("/some_path/file.bin", "rb+");
  char* read_data = readFromFile(fp);
  // do something on read data
  // EDIT: It is a matrix, so I would be reading row-wise.
  delete[] memBlk;
  fclose(fp);
}

When I use this code independently, the runtime is less than 1 second. However, when I put the exact same code (just to benchmark), in one of our applications, the runtime is 146 seconds. The application is quite a bulky one with upto 5G memory usage.

Some of it can be explained by the current memory usage, cache misses and other factors but a difference by a factor of 146 sounds unreasonable to me.

Can someone explain this?

Memory mapping may improve performance. Any other suggestions are also welcome.

Thanks.

Machine info: Linux my_mach 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

EDIT:

Thanks for your answers, However, i missed out on the fact that actually the place where i inserted was itself being called 25 times, so it is not exactly a factor of 146.

Anyways, the answers were helpful, Thanks for your time.

Why do you use dynamically allocated memory to store a buffer of a fixed size? — Blagovest Buyukliev
– Blagovest Buyukliev, Commented Jan 28, 2011 at 13:16
@Blagovest: you are not suggesting to store 64MB of data in the stack, are you ? — thkala
– thkala, Commented Jan 28, 2011 at 13:18
@thkala: Of course not, a static variable would be much more efficient. — Blagovest Buyukliev
– Blagovest Buyukliev, Commented Jan 28, 2011 at 13:19

Peer Stritzinger · Accepted Answer · 2011-01-28 15:38:39Z

3

It looks like the additional memory you need for your code induces thrashing in the application which probably is already running at the limit.

If you want to "do something" with the file you can either:

Process the file blockwise
Using mmap() or some similar memory mapping technique on your operating system to map the file into memory if you need more complicated access.

mmaping uses the buffer cache as backing store paging the contents into the file itself insead of the swap space. Using mmap is usually the fastest an easiest way to access a file. While not being totally portable (it can be made portable in the UNIX alike group of OS'es e.g. all BSD's, Linux, Solaris, and MacOSX)

You did not specify what access pattern "do something" will be so its hard to recommend some specific technique

edited Jan 28, 2011 at 15:38

answered Jan 28, 2011 at 13:20

Peer Stritzinger

8,4222 gold badges36 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

doron Over a year ago

a 64K allocation will most probably use mmap under the hood anyway.

foo Over a year ago

64*M* - apart from this detail, I agree with doron.

Peer Stritzinger Over a year ago

While malloc it might use mmap under the hood what it will do is: 1. mmap some empty pages, 2. copy the files contents into these pages (using different pages as buffer cache), 3. copy the content back into the file if it is to be written also. While mmap'ing the file just will use the buffer cache for paging the file-parts in and out. To make clear I meant mmap'ing the file I'll clarify this

doron · Accepted Answer · 2011-01-28 13:21:58Z

3

5G is a huge amount of memory, are you sure you have this much physical memory on board. If not the factor of 146 difference is probably due to swapping out to disk to try free up memory.

You should also probably look at using a 64 bit OS on a 64 bit machine.

answered Jan 28, 2011 at 13:21

doron

29.2k12 gold badges72 silver badges112 bronze badges

1 Comment

foo Over a year ago

The revised question points out above code by itself runs in 1 second, so it is not this part the slowdown originates, so swapping is indeed a likely cause of slowdown.

Dan Breslau · Accepted Answer · 2011-01-28 13:21:39Z

1

The process may not have 64MB of free store readily available in one contiguous block. Can you try splitting the 64MB buffer into a chain of smaller chunks, say 64K or 256K in size, and see if that helps improve performance?

answered Jan 28, 2011 at 13:21

Dan Breslau

11.5k2 gold badges37 silver badges44 bronze badges

5 Comments

doron Over a year ago

This will be directly allocated from the OS (probably via mmap) and not from the process heap. Given this the 64M block may be contiguous in virtual memory space but does not need to be contiguous in physical memory.

MSalters Over a year ago

If it didn't, new would have thrown std::bad_alloc.

Dan Breslau Over a year ago

@doron: Are you confusing K and MB there? If this is a 32 bit OS, the process may not have a 64MB chunk of address space available, let alone free store. The OS might (OK, I admit I'm speculating here) then try to remap the available address space. It would help if we knew the OS.

Dan Breslau Over a year ago

@MSalters: I'm not sure if new throws that right away. I think (as above, I'm speculating) that some systems will try to re-arrange the address space to free up a large enough contiguous block. I'm sure this behavior would be OS-dependent.

doron Over a year ago

@Dam as MSalters says, if there was not place in the address space for the bock of memory new would have failed. The problem is one of performance and I am pointing out the heap walking is not the issue here.

Collectives™ on Stack Overflow

Optimize File Read and Write

3 Answers 3

3 Comments

1 Comment

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related