0

Pure curiosity, not to be used in production, because obviously it could cause major problems.

With C++, when you allocate new memory (var *ch = new char[x]), the size is stored essentially in ch[-1] according to the C++ specs.

The question is, is there a way to get that value? I've tried:

char* ptr = ch;
--ptr
cout << *ptr;

// AND

cout << ch[-sizeof(char)];

So is there a way to hack this? Again, pure curiosity.

6
  • 3
    No, the size is not necessarily stored in ch[-1] (which would limit such strings to 255 bytes!) and I know no specs claiming that. And there is no way to portably retrieve that information, even if of course most implementations have their way to get it. Commented Jan 31, 2014 at 3:20
  • 2
    Actually it depends on the allocator used to allocate the object. The standard library has specific allocators for objects of "well-known" size (e.g. 2, 4, 8...) which allow to save space when allocating data for these objects. And, no, you shouldn't rely on a this specific behaviour of an abstract allocator. Commented Jan 31, 2014 at 3:21
  • @JonathonReinhart, how would I use size_t? Commented Jan 31, 2014 at 3:26
  • Would you care to tell us what "the specs" are? Commented Jan 31, 2014 at 3:26
  • @KerrekSB, the specs are curiosity. I'm not trying to use this for real code, just one of those, I wonder if it's possible. Commented Jan 31, 2014 at 3:28

3 Answers 3

4

Disclaimer: Never, ever count on this working. Consider this only "toy code" and never use it in "real" software!

Often times, the new operator ends up calling right to malloc(), which is known to exhibit this behavior in many versions of libc.

The problem with your code is that your pointer is a char* but the data you're after is probably really a size_t (4 bytes on a 32-bit system).

The following code does demonstrate almost what you're after:

#include <stddef.h>        // for size_t
#include <stdio.h>

void test(size_t size) {
    size_t result;
    char* p = new char[size];

    result = *((size_t*)p - 1);
    printf("Allocated:  %d (0x%X)  Preceding value: %d (0x%X)\n",
        size, size, result, result);

    delete p;
}

int main() {
    test(1);
    test(40);
    test(100);
    test(0x100);
    test(6666);
    test(0xDEAD);
    return 0;
}

Note that I'm first casting p to a size_t*, and then subtracting 1 (which equates to sizeof(size_t) bytes).

Output:

$ ./a.exe
Allocated:  1 (0x1)  Preceding value: 19 (0x13)
Allocated:  40 (0x28)  Preceding value: 51 (0x33)
Allocated:  100 (0x64)  Preceding value: 107 (0x6B)
Allocated:  256 (0x100)  Preceding value: 267 (0x10B)
Allocated:  6666 (0x1A0A)  Preceding value: 6675 (0x1A13)
Allocated:  57005 (0xDEAD)  Preceding value: 57019 (0xDEBB)

So the output is close.


Looking at malloc/malloc.c from glibc, we see the following comment:

  Alignment:                              2 * sizeof(size_t) (default)
       (i.e., 8 byte alignment with 4byte size_t). This suffices for
       nearly all current machines and C compilers. However, you can
       define MALLOC_ALIGNMENT to be wider than this if necessary.

  Minimum overhead per allocated chunk:   4 or 8 bytes
       Each malloced chunk has a hidden word of overhead holding size
       and status information

  Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
              8-byte ptrs:  24/32 bytes (including, 4/8 overhead)

These are excellent clues. There are two things that are probably happening:

  1. Your requested allocation sizes are being aligned up to the next alignment size.
  2. The lowest bits (not used because of the above alignment) are used for this "status information.

So we add the code to show numbers that "play along" with these rules:

#define SIZE        sizeof(size_t)
#define MAX(x,y)    ((x)>(y) ? (x) : (y))
#define align(x)    (((x)+2*SIZE-1) & ~(2*SIZE-1))
#define mask(x)     ((x) & ~0x3)

printf("align(size): 0x%X   mask(result): 0x%X\n\n",
    align(MAX(size+SIZE, 16)), mask(result));

The size also includes SIZE, and must be at least 16. This value is then aligned to the next 2*SIZE multiple. And the result we read out has the bottom 2 bits ANDed off. These are the "status information. The result:

$ ./a.exe
sizeof(size_t) = 4

size:  1 (0x1)  result: 19 (0x13)
align(size): 0x10   mask(result): 0x10

size:  40 (0x28)  result: 51 (0x33)
align(size): 0x30   mask(result): 0x30

size:  100 (0x64)  result: 107 (0x6B)
align(size): 0x68   mask(result): 0x68

size:  256 (0x100)  result: 267 (0x10B)
align(size): 0x108   mask(result): 0x108

size:  6666 (0x1A0A)  result: 6675 (0x1A13)
align(size): 0x1A10   mask(result): 0x1A10

size:  57005 (0xDEAD)  result: 57019 (0xDEBB)
align(size): 0xDEB8   mask(result): 0xDEB8

And there you have it!


Note that I'm using:

$ uname
CYGWIN_NT-6.1-WOW64

$ g++ --version
g++ (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

Again, this is highly implementation-specific and should never be trusted. However, it is true that many allocators store the allocation size right before the actual block of memory.


See also:

Sign up to request clarification or add additional context in comments.

5 Comments

@David There were a couple that were wrong. I've fixed it and updated my code.
The Itanium ABI says that for trivially destructible types (such as builtin types) the size isn't stored at all...
@KerrekSB Interesting. Would that mean that this version of glibc would be incompatible with Itanium? Or just out of spec? My real question is, what business to the ABI have specifying how a library call like malloc tracks its allocations?
@JonathonReinhart: You seem to be confusing malloc with C++. Everyone has to reinvent their own wheels. malloc must remember the size of the allocated memory, and C++ must know which objects need to be destroyed. (You're basically not answering the OP's question at all, but instead discuss how malloc is implemented.)
@KerrekSB For "not answer the OP's question at all" I think the program (which uses new) output is surprisingly correct.
2

No. Standard C++ provides no mechanism that given only the pointer to a dynamically allocated buffer, to determine the size of the buffer. It is assumed that internally these dynamically allocated regions are somehow tracked, such that delete [] pblah can work, but the C++ standard imposes nothing on the implementation on how it is done. Therefore what you are asking may, theoretically, be possible, it cannot be done without knowing the internals of your compiler, library, platform and implementation.

Comments

0

Others already stated that it is implementation specific. On MS VS2012 I find the default allocations to be aligned at a 16 byte granularity with 16 bytes preceeding the memory block in which the number of required bytes is stored in the first sizeof(size_t) bytes of that block.

I adopted Jonathon's Test

#include <iostream>

void test(size_t const size) 
{
  char * mem = new char[size];
  std::cout << "Number of allocated bytes at " << (size_t)mem;
  std::cout << " is: " << *(size_t*)(mem-16) << std::endl;
  delete [] mem;
}

void main()
{
  test(1U);
  test(40U);
  test(256);
  test(6666);
  test(57005);
  system("pause");
}

Giving me

Number of allocated bytes at 101044204016 is: 1
Number of allocated bytes at 101044242832 is: 40
Number of allocated bytes at 101044242832 is: 256
Number of allocated bytes at 101044244064 is: 6666
Number of allocated bytes at 101044244064 is: 57005
Drücken Sie eine beliebige Taste . . .

Where

  • 101044204016 / 16 = 6315262751
  • 101044242832 / 16 = 6315265177
  • 101044244064 / 16 = 6315265254

Als see C++ allocates abnormally large amout memory for variables

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.