1

I have this code to read 64MB of binary data into memory:


#define SIZE 8192
char* readFromFile(FILE* fp)
{
  char* memBlk = new char[SIZE*SIZE];
  fread(memBlk, 1, SIZE*SIZE, fp);
  return memBlk;
}

int main()
{
  FILE* fp = fopen("/some_path/file.bin", "rb+");
  char* read_data = readFromFile(fp);
  // do something on read data
  // EDIT: It is a matrix, so I would be reading row-wise.
  delete[] memBlk;
  fclose(fp);
}

When I use this code independently, the runtime is less than 1 second. However, when I put the exact same code (just to benchmark), in one of our applications, the runtime is 146 seconds. The application is quite a bulky one with upto 5G memory usage.

Some of it can be explained by the current memory usage, cache misses and other factors but a difference by a factor of 146 sounds unreasonable to me.

Can someone explain this?

Memory mapping may improve performance. Any other suggestions are also welcome.

Thanks.

Machine info: Linux my_mach 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux

EDIT:

Thanks for your answers, However, i missed out on the fact that actually the place where i inserted was itself being called 25 times, so it is not exactly a factor of 146.

Anyways, the answers were helpful, Thanks for your time.

14
  • 2
    Why do you use dynamically allocated memory to store a buffer of a fixed size? Commented Jan 28, 2011 at 13:16
  • 1
    @Blagovest: you are not suggesting to store 64MB of data in the stack, are you ? Commented Jan 28, 2011 at 13:18
  • What does the // .. do something with it.. do? Commented Jan 28, 2011 at 13:19
  • 1
    @thkala: Of course not, a static variable would be much more efficient. Commented Jan 28, 2011 at 13:19
  • 1
    Is this on a 32 bit or 64 bit OS? Commented Jan 28, 2011 at 13:26

3 Answers 3

3

It looks like the additional memory you need for your code induces thrashing in the application which probably is already running at the limit.

If you want to "do something" with the file you can either:

  • Process the file blockwise

  • Using mmap() or some similar memory mapping technique on your operating system to map the file into memory if you need more complicated access.

    mmaping uses the buffer cache as backing store paging the contents into the file itself insead of the swap space. Using mmap is usually the fastest an easiest way to access a file. While not being totally portable (it can be made portable in the UNIX alike group of OS'es e.g. all BSD's, Linux, Solaris, and MacOSX)

You did not specify what access pattern "do something" will be so its hard to recommend some specific technique

Sign up to request clarification or add additional context in comments.

3 Comments

a 64K allocation will most probably use mmap under the hood anyway.
64*M* - apart from this detail, I agree with doron.
While malloc it might use mmap under the hood what it will do is: 1. mmap some empty pages, 2. copy the files contents into these pages (using different pages as buffer cache), 3. copy the content back into the file if it is to be written also. While mmap'ing the file just will use the buffer cache for paging the file-parts in and out. To make clear I meant mmap'ing the file I'll clarify this
3

5G is a huge amount of memory, are you sure you have this much physical memory on board. If not the factor of 146 difference is probably due to swapping out to disk to try free up memory.

You should also probably look at using a 64 bit OS on a 64 bit machine.

1 Comment

The revised question points out above code by itself runs in 1 second, so it is not this part the slowdown originates, so swapping is indeed a likely cause of slowdown.
1

The process may not have 64MB of free store readily available in one contiguous block. Can you try splitting the 64MB buffer into a chain of smaller chunks, say 64K or 256K in size, and see if that helps improve performance?

5 Comments

This will be directly allocated from the OS (probably via mmap) and not from the process heap. Given this the 64M block may be contiguous in virtual memory space but does not need to be contiguous in physical memory.
If it didn't, new would have thrown std::bad_alloc.
@doron: Are you confusing K and MB there? If this is a 32 bit OS, the process may not have a 64MB chunk of address space available, let alone free store. The OS might (OK, I admit I'm speculating here) then try to remap the available address space. It would help if we knew the OS.
@MSalters: I'm not sure if new throws that right away. I think (as above, I'm speculating) that some systems will try to re-arrange the address space to free up a large enough contiguous block. I'm sure this behavior would be OS-dependent.
@Dam as MSalters says, if there was not place in the address space for the bock of memory new would have failed. The problem is one of performance and I am pointing out the heap walking is not the issue here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.