So I'm making a linked list for one of my school assignments in C. I found this website that has some helpful code snippets, and I noticed that they use malloc() to allocate more memory every time they add a new node:

void push(node_t * head, int val) {
    node_t * current = head;
    while (current->next != NULL) {
        current = current->next;
    }

    /* now we can add a new variable */
    current->next = (node_t *) malloc(sizeof(node_t));
    current->next->val = val;
    current->next->next = NULL;
// https://www.learn-c.org/en/Linked_lists
}

Obviously, this is useful because it allows the list to dynamically grow and shrink at runtime while using minimal memory, but if we free the deleted nodes using free(), then won't this lead to some pretty bad memory fragmentation if your program uses the lists extensively? A solution to this could be to allocate a hundred or a thousand nodes at a time, and manage the different chunks, but this seems complicated. What would be the preferred method of dealing with this? Would most programs just allocate one chunk of memory for the list and just have this as the maximum number of nodes?

7 Replies 7

won't this lead to some pretty bad memory fragmentation if your program uses the lists extensively?

System memory management are fairly sophisticated and well handle many scenarios including this common one.

You might write better memory management code than the dozens to 100s of other programmers that have tweaked the system code as you do have some application specific insight, but not too likely.

Instead I'd recommend to apply your valuable time on higher level coding goals.

Is premature optimization really the root of all evil?


Your code, as is, looks OK, unless head == NULL or allocation failure.

Agreed. If memory fragmentation is to be avoided then allocating 5,000 nodes one-at-a-time is a valid concern, imo. If we allocate space in advance for 5,000 nodes then we might as well use a simple array allocated in advance, and then we can enjoy its ease of use, quick sorting capability, etc.

In practical usage for file directory list-building, I now 1.) get the file count ( findfirst() findnext() ), 2.) allocate my array once for that many indexes, then 3.) call the findfirst() findnext() pair again to populate the array with filenames & data. On newer systems this is a performance non-issue, especially too because for the 2nd call, the directory in question is generally cached by the system so the 2nd call is really, really fast...

I used to use a linked list for this, but maintenance of the code was a headache. Linked lists are great academically and every coder should know them, but the benefits are just not worth it for most of my coding.

Iterating over the entire list while (current->next != NULL) for node insertion becomes really slow as the number of nodes increases. If you keep a pointer to the first (head) node and a pointer to the last (tail) node, you can do insertions without iterating.

There are a number of very good examples for creating linked list using this approach on StackOverflow. Many will create a list struct containing the head and tail pointers using that to declare/define the list. Make sure you understand you can use both pointers, as well as pointer-to-pointer to traverse the list to help with node removal/deletion. That is also included in the examples. And see Linus on Understanding Pointers

Remember, it is up to you to track the memory you allocate so it can be freed. That is generally handled in deletion of individual nodes, when deleting a specific node, and when destroying the list when it is no longer needed (e.g. deleting all nodes)

There is another way to tackle the problem: avoid linked lists. While theoretically nice, in practice they are rarely the best choice of data structure, even disregarding the fragmentation. They are good when you need to do a lot of splicing and joining. Otherwise try a simple array first, and only consider a linked list if you find it being a bottleneck.

This is one of the thing that algorithm classes focus way too much on out of tradition. Yes you need to know how a linked list works but it is not something you should actually implement like this.

Linked lists with heap allocation of individual nodes was always a completely broken data type, unsuitable for nearly any use. They have slow add/removal, they are slow to iterate through, they indeed create fragmentation and are horribly cache-unfriendly.

There may have been a place for linked list like this in the 1980s somewhere but I doubt it. In the 1990s, cache memory was a thing. Brute force data copying/iteration might be much faster than a linked list, making linked lists like these quite useless.

There are a few corner cases like "chained" hash tables or graphs, where individual node allocation might make sense, because then it is a limited amount of data and access times of the linked list part are probably negligible... maybe.

What might make more sense is a linked list which uses an underlying static array implementation and "next array index" integers instead of "next pointers". These are fast and cache-friendly.

You need to decide how to represent empty lists. If you represent them by setting head to a null pointer, then your push(head, val) function will not work when pushing values to an empty list. One way around that is to have head point to a "control" that is not considered to be one of the "data" nodes. Then an empty list would be represented by head->next being a null pointer.

Another (I would say more common) option is to allow the list manipulation functions to change the head pointer, either by using an extra level of indirection, passing a pointer to the head pointer so that the function can modify it, or by returning an updated head pointer value that the caller is supposed to assign to its head pointer variable. The first version with the extra level of indirection is something like this:

void push(node_t **head, int val) {
    node_t *tmpnode;
    node_t *newnode;
 
    /* Allocate new node. */
    newnode = malloc(sizeof(*newnode));
    newnode->val = val;
    newnode->next = NULL;

    /* Find end of list. */
    while ((tmpnode = *head) != NULL)
        head = &tmpnode->next;
    /* head is now pointing to the null pointer at the end of the list */
    *head = newnode; /* Append the new node to the end of the list. */
}

The second version that returns an updated head pointer is something like this:

node_t *push(node_t *head, int val) {
    node_t **ptmp;
    node_t *tmpnode;
    node_t *newnode;
 
    /* Allocate new node. */
    newnode = malloc(sizeof(*newnode));
    newnode->val = val;
    newnode->next = NULL;

    /* Find end of list. */
    ptmp = &head;
    while ((tmpnode = *ptmp) != NULL)
        ptmp = &tmpnode->next;
    /* ptmp is now pointing to the null pointer at the end of the list. */
    *ptmp = newnode; /* Append the new node to the end of the list. */
    /* head will have been updated if list was empty. Return updated head. */
    return head;
}

if we free the deleted nodes using free(), then won't this lead to some pretty bad memory fragmentation if your program uses the lists extensively?

Not necessarily. It depends on the usage pattern, the total number of nodes involved, and the details of the allocator involved, among other things. Consider:

  • Whenever you free a node, you open up a free chunk at least large enough to accommodate a node. If it's not reused for another purpose first then that chunk can be used to perfectly satisfy a future node allocation.

  • Although some allocators sometimes hand out chunks that are larger in fact than was requested, so that there is wasted space associated with many allocations, not all allocators work that way. Glibc's, for example, does not overallocate in this sense.

  • If there's not a lot of other memory allocation going on contemporaneously with node allocations, node allocations are likely to be close together, possibly even contiguous.

  • Long linked lists tend not to be very practical, and modern (virtual) address spaces are enormous. Real-world linked-list usage just can't occupy enough of the address space to produce major fragmentation problems. Unless on systems with highly constrained memory, where any amount of dynamic allocation is usually a non-starter to begin with.

A solution to this could be to allocate a hundred or a thousand nodes at a time, and manage the different chunks, but this seems complicated.

Yes, that can be and occasionally is done. Glibc's allocator does something sort of like that internally. And something along these lines is usually necessary if you want to put your linked list in shared memory, where it can be accessed by processes having separate address spaces.

What would be the preferred method of dealing with this?

If your problem is one that is suited for a linked list in the first place then memory fragmentation arising from that is probably not an issue that needs any special attention. There may be other reasons to avoid per-node allocation and deallocation, but as far as fragmentation goes, the preferred approach is probably to not worry about it once you've settled on using a linked list at all. In other words: don't overthink this.

Would most programs just allocate one chunk of memory for the list and just have this as the maximum number of nodes?

Some programs do take that approach. And some retain removed nodes for reuse instead deallocating them, so that they need allocate new ones only when they don't have any available to recycle. Those fit naturally together, and in combination, they even allow you to allocate additional blocks of nodes. But all of that is usually more trouble than it's worth.

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.