11src/backend/utils/mmgr/README
22
3- Notes About Memory Allocation Redesign
4- ======================================
5-
6- Up through version 7.0, Postgres had serious problems with memory leakage
7- during large queries that process a lot of pass-by-reference data. There
8- was no provision for recycling memory until end of query. This needed to be
9- fixed, even more so with the advent of TOAST which allows very large chunks
10- of data to be passed around in the system. This document describes the new
11- memory management system implemented in 7.1.
12-
3+ Memory Context System Design Overview
4+ =====================================
135
146Background
157----------
@@ -38,10 +30,10 @@ to or get more memory from the same context the chunk was originally
3830allocated in.
3931
4032At all times there is a "current" context denoted by the
41- CurrentMemoryContext global variable. The backend macro palloc()
42- implicitly allocates space in that context. The MemoryContextSwitchTo()
43- operation selects a new current context (and returns the previous context,
44- so that the caller can restore the previous context before exiting).
33+ CurrentMemoryContext global variable. palloc() implicitly allocates space
34+ in that context. The MemoryContextSwitchTo() operation selects a new current
35+ context (and returns the previous context, so that the caller can restore the
36+ previous context before exiting).
4537
4638The main advantage of memory contexts over plain use of malloc/free is
4739that the entire contents of a memory context can be freed easily, without
@@ -60,8 +52,10 @@ The behavior of palloc and friends is similar to the standard C library's
6052malloc and friends, but there are some deliberate differences too. Here
6153are some notes to clarify the behavior.
6254
63- * If out of memory, palloc and repalloc exit via elog(ERROR). They never
64- return NULL, and it is not necessary or useful to test for such a result.
55+ * If out of memory, palloc and repalloc exit via elog(ERROR). They
56+ never return NULL, and it is not necessary or useful to test for such
57+ a result. With palloc_extended() that behavior can be overridden
58+ using the MCXT_ALLOC_NO_OOM flag.
6559
6660* palloc(0) is explicitly a valid operation. It does not return a NULL
6761pointer, but a valid chunk of which no bytes may be used. However, the
@@ -71,28 +65,18 @@ error. Similarly, repalloc allows realloc'ing to zero size.
7165* pfree and repalloc do not accept a NULL pointer. This is intentional.
7266
7367
74- pfree/repalloc No Longer Depend On CurrentMemoryContext
75- -------------------------------------------------------
76-
77- Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk
78- whether it belongs to CurrentMemoryContext or not --- the chunk's owning
79- context will be invoked to handle the operation, regardless. This is a
80- change from the old requirement that CurrentMemoryContext must be set
81- to the same context the memory was allocated from before one can use
82- pfree() or repalloc().
83-
84- There was some consideration of getting rid of CurrentMemoryContext entirely,
85- instead requiring the target memory context for allocation to be specified
86- explicitly. But we decided that would be too much notational overhead ---
87- we'd have to pass an appropriate memory context to called routines in
88- many places. For example, the copyObject routines would need to be passed
89- a context, as would function execution routines that return a
90- pass-by-reference datatype. And what of routines that temporarily
91- allocate space internally, but don't return it to their caller? We
92- certainly don't want to clutter every call in the system with "here is
93- a context to use for any temporary memory allocation you might want to
94- do". So there'd still need to be a global variable specifying a suitable
95- temporary-allocation context. That might as well be CurrentMemoryContext.
68+ The Current Memory Context
69+ --------------------------
70+
71+ Because it would be too much notational overhead to always pass an
72+ appropriate memory context to called routines, there always exists the
73+ notion of the current memory context CurrentMemoryContext. Without it,
74+ for example, the copyObject routines would need to be passed a context, as
75+ would function execution routines that return a pass-by-reference
76+ datatype. Similarly for routines that temporarily allocate space
77+ internally, but don't return it to their caller? We certainly don't
78+ want to clutter every call in the system with "here is a context to
79+ use for any temporary memory allocation you might want to do".
9680
9781The upshot of that reasoning, though, is that CurrentMemoryContext should
9882generally point at a short-lifespan context if at all possible. During
@@ -102,42 +86,83 @@ context having greater than transaction lifespan, since doing so risks
10286permanent memory leaks.
10387
10488
105- Additions to the Memory-Context Mechanism
106- -----------------------------------------
107-
108- Before 7.1 memory contexts were all independent, but it was too hard to
109- keep track of them; with lots of contexts there needs to be explicit
110- mechanism for that.
111-
112- We solved this by creating a tree of "parent" and "child" contexts. When
113- creating a memory context, the new context can be specified to be a child
114- of some existing context. A context can have many children, but only one
115- parent. In this way the contexts form a forest (not necessarily a single
116- tree, since there could be more than one top-level context; although in
117- current practice there is only one top context, TopMemoryContext).
118-
119- We then say that resetting or deleting any particular context resets or
120- deletes all its direct and indirect children as well. This feature allows
121- us to manage a lot of contexts without fear that some will be leaked; we
122- only need to keep track of one top-level context that we are going to
123- delete at transaction end, and make sure that any shorter-lived contexts
124- we create are descendants of that context. Since the tree can have
125- multiple levels, we can deal easily with nested lifetimes of storage,
126- such as per-transaction, per-statement, per-scan, per-tuple. Storage
127- lifetimes that only partially overlap can be handled by allocating
128- from different trees of the context forest (there are some examples
129- in the next section).
130-
131- Actually, it turns out that resetting a given context should almost
132- always imply deleting, not just resetting, any child contexts it has.
133- So MemoryContextReset() means that, and if you really do want a tree of
134- empty contexts you need to call MemoryContextResetOnly() plus
135- MemoryContextResetChildren().
89+ pfree/repalloc Do Not Depend On CurrentMemoryContext
90+ ----------------------------------------------------
91+
92+ pfree() and repalloc() can be applied to any chunk whether it belongs
93+ to CurrentMemoryContext or not --- the chunk's owning context will be
94+ invoked to handle the operation, regardless.
95+
96+
97+ "Parent" and "Child" Contexts
98+ -----------------------------
99+
100+ If all contexts were independent, it'd be hard to keep track of them,
101+ especially in error cases. That is solved this by creating a tree of
102+ "parent" and "child" contexts. When creating a memory context, the
103+ new context can be specified to be a child of some existing context.
104+ A context can have many children, but only one parent. In this way
105+ the contexts form a forest (not necessarily a single tree, since there
106+ could be more than one top-level context; although in current practice
107+ there is only one top context, TopMemoryContext).
108+
109+ Deleting a context deletes all its direct and indirect children as
110+ well. When resetting a context it's almost always more useful to
111+ delete child contexts, thus MemoryContextReset() means that, and if
112+ you really do want a tree of empty contexts you need to call
113+ MemoryContextResetOnly() plus MemoryContextResetChildren().
114+
115+ These features allow us to manage a lot of contexts without fear that
116+ some will be leaked; we only need to keep track of one top-level
117+ context that we are going to delete at transaction end, and make sure
118+ that any shorter-lived contexts we create are descendants of that
119+ context. Since the tree can have multiple levels, we can deal easily
120+ with nested lifetimes of storage, such as per-transaction,
121+ per-statement, per-scan, per-tuple. Storage lifetimes that only
122+ partially overlap can be handled by allocating from different trees of
123+ the context forest (there are some examples in the next section).
136124
137125For convenience we also provide operations like "reset/delete all children
138126of a given context, but don't reset or delete that context itself".
139127
140128
129+ Memory Context Reset/Delete Callbacks
130+ -------------------------------------
131+
132+ A feature introduced in Postgres 9.5 allows memory contexts to be used
133+ for managing more resources than just plain palloc'd memory. This is
134+ done by registering a "reset callback function" for a memory context.
135+ Such a function will be called, once, just before the context is next
136+ reset or deleted. It can be used to give up resources that are in some
137+ sense associated with an object allocated within the context. Possible
138+ use-cases include
139+ * closing open files associated with a tuplesort object;
140+ * releasing reference counts on long-lived cache objects that are held
141+ by some object within the context being reset;
142+ * freeing malloc-managed memory associated with some palloc'd object.
143+ That last case would just represent bad programming practice for pure
144+ Postgres code; better to have made all the allocations using palloc,
145+ in the target context or some child context. However, it could well
146+ come in handy for code that interfaces to non-Postgres libraries.
147+
148+ Any number of reset callbacks can be established for a memory context;
149+ they are called in reverse order of registration. Also, callbacks
150+ attached to child contexts are called before callbacks attached to
151+ parent contexts, if a tree of contexts is being reset or deleted.
152+
153+ The API for this requires the caller to provide a MemoryContextCallback
154+ memory chunk to hold the state for a callback. Typically this should be
155+ allocated in the same context it is logically attached to, so that it
156+ will be released automatically after use. The reason for asking the
157+ caller to provide this memory is that in most usage scenarios, the caller
158+ will be creating some larger struct within the target context, and the
159+ MemoryContextCallback struct can be made "for free" without a separate
160+ palloc() call by including it in this larger struct.
161+
162+
163+ Memory Contexts in Practice
164+ ===========================
165+
141166Globally Known Contexts
142167-----------------------
143168
@@ -325,83 +350,64 @@ copy step.
325350Mechanisms to Allow Multiple Types of Contexts
326351----------------------------------------------
327352
328- We may want several different types of memory contexts with different
329- allocation policies but similar external behavior. To handle this,
330- memory allocation functions will be accessed via function pointers,
331- and we will require all context types to obey the conventions given here.
332- (As of 2015, there's actually still just one context type; but interest in
333- creating other types has never gone away entirely, so we retain this API.)
334-
335- A memory context is represented by an object like
336-
337- typedef struct MemoryContextData
338- {
339- NodeTag type; /* identifies exact kind of context */
340- MemoryContextMethods methods;
341- MemoryContextData *parent; /* NULL if no parent (toplevel context) */
342- MemoryContextData *firstchild; /* head of linked list of children */
343- MemoryContextData *nextchild; /* next child of same parent */
344- char *name; /* context name (just for debugging) */
345- } MemoryContextData, *MemoryContext;
346-
347- This is essentially an abstract superclass, and the "methods" pointer is
348- its virtual function table. Specific memory context types will use
353+ To efficiently allow for different allocation patterns, and for
354+ experimentation, we allow for different types of memory contexts with
355+ different allocation policies but similar external behavior. To
356+ handle this, memory allocation functions are accessed via function
357+ pointers, and we require all context types to obey the conventions
358+ given here.
359+
360+ A memory context is represented by struct MemoryContextData (see
361+ memnodes.h). This struct identifies the exact type of the context, and
362+ contains information common between the different types of
363+ MemoryContext like the parent and child contexts, and the name of the
364+ context.
365+
366+ This is essentially an abstract superclass, and the behavior is
367+ determined by the "methods" pointer is its virtual function table
368+ (struct MemoryContextMethods). Specific memory context types will use
349369derived structs having these fields as their first fields. All the
350- contexts of a specific type will have methods pointers that point to the
351- same static table of function pointers, which look like
352-
353- typedef struct MemoryContextMethodsData
354- {
355- Pointer (*alloc) (MemoryContext c, Size size);
356- void (*free_p) (Pointer chunk);
357- Pointer (*realloc) (Pointer chunk, Size newsize);
358- void (*reset) (MemoryContext c);
359- void (*delete) (MemoryContext c);
360- } MemoryContextMethodsData, *MemoryContextMethods;
361-
362- Alloc, reset, and delete requests will take a MemoryContext pointer
363- as parameter, so they'll have no trouble finding the method pointer
364- to call. Free and realloc are trickier. To make those work, we
365- require all memory context types to produce allocated chunks that
366- are immediately preceded by a standard chunk header, which has the
367- layout
368-
369- typedef struct StandardChunkHeader
370- {
371- MemoryContext mycontext; /* Link to owning context object */
372- Size size; /* Allocated size of chunk */
373- };
374-
375- It turns out that the pre-existing aset.c memory context type did this
376- already, and probably any other kind of context would need to have the
377- same data available to support realloc, so this is not really creating
378- any additional overhead. (Note that if a context type needs more per-
379- allocated-chunk information than this, it can make an additional
380- nonstandard header that precedes the standard header. So we're not
381- constraining context-type designers very much.)
382-
383- Given this, the pfree routine looks something like
384-
385- StandardChunkHeader * header =
386- (StandardChunkHeader *) ((char *) p - sizeof(StandardChunkHeader));
387-
388- (*header->mycontext->methods->free_p) (p);
370+ contexts of a specific type will have methods pointers that point to
371+ the same static table of function pointers.
372+
373+ While operations like allocating from and resetting a context take the
374+ relevant MemoryContext as a parameter, operations like free and
375+ realloc are trickier. To make those work, we require all memory
376+ context types to produce allocated chunks that are immediately,
377+ without any padding, preceded by a pointer to the corresponding
378+ MemoryContext.
379+
380+ If a type of allocator needs additional information about its chunks,
381+ like e.g. the size of the allocation, that information can in turn
382+ precede the MemoryContext. This means the only overhead implied by
383+ the memory context mechanism is a pointer to its context, so we're not
384+ constraining context-type designers very much.
385+
386+ Given this, routines like pfree their corresponding context with an
387+ operation like (although that is usually encapsulated in
388+ GetMemoryChunkContext())
389+
390+ MemoryContext context = *(MemoryContext*) (((char *) pointer) - sizeof(void *));
391+
392+ and then invoke the corresponding method for the context
393+
394+ (*context->methods->free_p) (p);
389395
390396
391397More Control Over aset.c Behavior
392398---------------------------------
393399
394- Previously, aset.c always allocated an 8K block upon the first allocation
395- in a context, and doubled that size for each successive block request.
396- That's good behavior for a context that might hold *lots* of data, and
397- the overhead wasn't bad when we had only a few contexts in existence.
398- With dozens if not hundreds of smaller contexts in the system, we need
399- to be able to fine-tune things a little better.
400+ By default aset.c always allocates an 8K block upon the first
401+ allocation in a context, and doubles that size for each successive
402+ block request. That's good behavior for a context that might hold
403+ *lots* of data. But if there are dozens if not hundreds of smaller
404+ contexts in the system, we need to be able to fine-tune things a
405+ little better.
400406
401- The creator of a context is now able to specify an initial block size
402- and a maximum block size. Selecting smaller values can prevent wastage
403- of space in contexts that aren't expected to hold very much (an example is
404- the relcache's per-relation contexts).
407+ The creator of a context is able to specify an initial block size and
408+ a maximum block size. Selecting smaller values can prevent wastage of
409+ space in contexts that aren't expected to hold very much (an example
410+ is the relcache's per-relation contexts).
405411
406412Also, it is possible to specify a minimum context size. If this
407413value is greater than zero then a block of that size will be grabbed
@@ -414,37 +420,3 @@ will not allocate very much space per tuple cycle. To make this usage
414420pattern cheap, the first block allocated in a context is not given
415421back to malloc() during reset, but just cleared. This avoids malloc
416422thrashing.
417-
418-
419- Memory Context Reset/Delete Callbacks
420- -------------------------------------
421-
422- A feature introduced in Postgres 9.5 allows memory contexts to be used
423- for managing more resources than just plain palloc'd memory. This is
424- done by registering a "reset callback function" for a memory context.
425- Such a function will be called, once, just before the context is next
426- reset or deleted. It can be used to give up resources that are in some
427- sense associated with an object allocated within the context. Possible
428- use-cases include
429- * closing open files associated with a tuplesort object;
430- * releasing reference counts on long-lived cache objects that are held
431- by some object within the context being reset;
432- * freeing malloc-managed memory associated with some palloc'd object.
433- That last case would just represent bad programming practice for pure
434- Postgres code; better to have made all the allocations using palloc,
435- in the target context or some child context. However, it could well
436- come in handy for code that interfaces to non-Postgres libraries.
437-
438- Any number of reset callbacks can be established for a memory context;
439- they are called in reverse order of registration. Also, callbacks
440- attached to child contexts are called before callbacks attached to
441- parent contexts, if a tree of contexts is being reset or deleted.
442-
443- The API for this requires the caller to provide a MemoryContextCallback
444- memory chunk to hold the state for a callback. Typically this should be
445- allocated in the same context it is logically attached to, so that it
446- will be released automatically after use. The reason for asking the
447- caller to provide this memory is that in most usage scenarios, the caller
448- will be creating some larger struct within the target context, and the
449- MemoryContextCallback struct can be made "for free" without a separate
450- palloc() call by including it in this larger struct.
0 commit comments