Delphi: storing data in classes vs records, memory usage reduction

Question

I have quite a lot of data to store, read and modify in memory while the application works. The data could be compared to a tree, where each node is described by limited number of strings and integers, and has quite a lot of subelements. Currently the data is stored using classes/objects, like

TRootElement = class
  fName, fDescription: string;
  fPos: integer;
  /// etc
end;

fDocs: TObjectList; //list of TVariable = class(TRootElement)
fClasses: TObjectList; // list of TClass=class(TRootElement)

currently the memory consumed by the program is unacceptable, thus I'm looking for the solution to limit it.

My question is: will the consumption be significantly reduced if I replace current, OOP and Objects based architecture with one based on records? For example, the general record could contain:

TRootElement = record
  fType: TElemType; // enum: root, variable, class, etc ... 
  fName, fDesc: string; 
  // all the fields used by root elem and it's descendants there
end;

Should I replace TList with pointers to next / previous elements? Since I'm never accessing the elements of list by index, I'm always looping through the whole list, it shouldn't be really hard to do... however I'd like to avoid it if not necessary.

Thanks! m.

If you show us a portion of the data in your 1.8 megabyte base text file, then maybe we can give you some better answers. — Jeroen Wiert Pluimers
– Jeroen Wiert Pluimers, Commented Dec 31, 2009 at 17:44
Reading down it seems this application contains a "code completion" parse tree. I would consider storing such things in a memory-mapped file, which can be quickly accessed (loaded) and dropped, freeing memory without having to parse again. I think a B+Tree file stored on disk would be fast enough to replace your ram based system completely. — Warren P
– Warren P, Commented Dec 31, 2009 at 18:21
Memory you won't really save a lot. However, there are some things to execute in the constructor of an object. So, from CPU point of view, you might save a few CPU cycles (not much) if you use records instead of objects. — Error - CPU Not Foud
– Error - CPU Not Foud, Commented Apr 14, 2022 at 15:34

Rob Kennedy · Accepted Answer · 2009-12-31 03:05:26Z

15

Changing a class into a record will reduce memory usage, but the significance of the savings decreases as the number of fields in the class or record increases. The size difference between a class and the corresponding record is exactly four bytes, which accounts for the VMT pointer that a class holds but which is absent from a record. That difference is usually negligible when you consider the trade-off: To save four bytes, you give up inheritance, polymorphism, data-hiding, and other object-oriented features. (Some of that might be mitigated with Delphi's new "records with methods," but if you only have Delphi 2005, you don't have that feature yet.)

In fact, if those four bytes really make the difference for your program, then you probably have a bigger problem to solve. That four-byte savings is wiped away simply by adding another node to your tree. With a large enough dataset, it won't matter how small you make any one node since you won't be able to keep them all in memory anyway. You'd need to investigate some kind of caching scheme, so only some nodes are kept in memory and the rest are kept elsewhere, such as in a file or a database.

If you replace your current lists with doubly linked lists of nodes, you'll probably see your memory use increase because now each of your nodes is keeping track of its next and previous neighbors whereas before the TObjectList was managing all that itself.

answered Dec 31, 2009 at 3:05

Rob Kennedy

164k23 gold badges288 silver badges481 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

migajek Over a year ago

I don't get it how is that possible that instance of TObjectList is smaller then n_reocords * 2 * sizeof(pointer)... what about all the fields inherited by TObjectList from it's parent class?

Rob Kennedy Over a year ago

TObject is 4 bytes. TList adds 12 bytes, and TObjectList adds 4 more for a total empty TObjectList size of 20 bytes. Internally, TList uses an array of pointers to its elements (an additional 4 bytes per item), so the formula for total memory usage is y = 4x + 20. With a doubly linked list, you use two pointers (8 bytes) for every node in list object, so the formula is y = 8x. Although it starts out lower, its slope is twice that of the TList version, so it grows twice as fast. The intersection point, where you break even, is just 5.

Error - CPU Not Foud Over a year ago

Considering for how FasMM works (the smallest allocation unit), I am not even sure if you can really save those 4 byes :)

Community · Accepted Answer · 2017-05-23 12:14:28Z

2

currently the memory consumed by the program is unacceptable

What is the meaning of unacceptable? Have you mesure it? What are the facts (number of objets, size of objects, used memory)?

Have you check with FastMM if your programm has a memory leak? If not this is the first think you should do.

If your lists often growing then perhaps you have a problem with memory fragmentation. Use the capacity property of the list (if possible). In this situation a linked list can help but a linked list needs more memory than TList (if capacity is used sensible). See How to monitor or visualize memory fragmentation of a delphi application for more Information how to check it.

For Delphi <= 2005 it can be helpfull to replace the Borland Memory Manager with FastMM it is so easy todo.

At least, like Rob, I don't think that a change to records solve your problems.

edited May 23, 2017 at 12:14

CommunityBot

11 silver badge

answered Dec 31, 2009 at 10:26

Heinz Z.

1,5943 gold badges15 silver badges31 bronze badges

4 Comments

migajek Over a year ago

the application is web IDE, and the module is PHP code completion. Unnaceptable means that enabling the module & loading data for PHP5 (functions, classes, consts with descriptions (documentations)) increases memory usage by 10 Mb. Now imagine what happens if user decides to load some framework definitions. I'm using FastMM in fulldebugmode, definitely no leaks there.

Marco van de Voort Over a year ago

If the data is fairly static, a common CGI trick is to put it in an serivice that shares it over shared mem and share it with all cgi instances, and post mutations back to the service over some IPC means.

Warren P Over a year ago

10 megabytes is nothing in a machine with 2 gigabytes of RAM.

migajek Over a year ago

10 megabytes is not a problem, but that is only for php5. If user wants to have a code completion for framework or his own code, the usage will increment drastically.

Jeroen Wiert Pluimers · Accepted Answer · 2009-12-31 13:34:20Z

2

Compared to the vast majority of IDEs, a memory increase for loading all of the PHP5 meta data of only 10 megabyte is actually pretty good.

If it is really worth the effort for you, I'd start with string literal merging.

Make all the strings go in a global table (or dictionary), and point to that from all your strings.

You can take this a step further, since the PHP 5 language and libraries are pretty static: convert your whole data structure from dynamic into static constants (using records), and all the indexes enumeration types.

What you might be able to do is make all your strings resourcestrings or string constants, and see if the Delphi compiler can do the literal merging for you.

I just noticed that you also load the documentation for all of the PHP5 stuff. That accounts for quite a bit of memory. You might want to load those into compressed streams.

edited Dec 31, 2009 at 13:34

answered Dec 31, 2009 at 13:28

Jeroen Wiert Pluimers

24.6k10 gold badges83 silver badges165 bronze badges

1 Comment

migajek Over a year ago

not the full documentation in fact, just a brief description + description of parameters. This data is being retrieved quite often, on each code completion execution, thus the access time is also very important. However its memory representation is too big, comparing to a plain file size (text file with classes/functions names and descriptions) which is around 1.8 Mb.

Marco van de Voort · Accepted Answer · 2009-12-31 10:35:27Z

1

If you can set limits on strings as Kornel says, it really matters. ansistring has some internal overhead, and the additional overhead too. However shortstring is always allocated, even when not used.

If you are really tight on memory, doing your own allocation for the strings is more sensible, specially if the data is relatively immutable. Then simply allocate a big block, and put all strings in there, prefixed with a 16-bit length or so.

Less lowlevel tricks, like simply deduping (some of the) strings saves a lot of storage space too.

Note that the record vs class discussion of Rob only goes if you manage to statically instantiate the class in memory you allocate very cheaply, which you probably don't do. This because you can use array of record. Otherwise the fact that it is always a reference type causes heapoverhead and -slack (fastmm, 16 byte granularity)

I would recommend against using tstringlist/tlist/tobjectlist, because insertion deletion in very large lists (millions) can be painfull because the deletion/insertion is O(n), and inserting in the middle means shifting half the data. This gets painful somewhere between 20-100k and 1M elements, depending on how your access pattern is.

Using a tlist of tlists, and not letting every tlist get too big is already a good workaround.

When I did this (for an OLAP cluster when 2GB server memory was still $2000), I at one point even used the alignment bits in pointers to store the size-class of allocations. I wouldn't recommend that though :-)

Of course going 64-bit with FPC is also an option. I've got the core server part of above 32-bit solution working in 64-bit in under an hour.

answered Dec 31, 2009 at 10:35

Marco van de Voort

26.5k6 gold badges62 silver badges95 bronze badges

2 Comments

migajek Over a year ago

well, the memory I'm managing is not that big, definitely. That is just unacceptable comparing to what I'm actually storing.

Marco van de Voort Over a year ago

I saw meanwhile in the comment. Just make it records, and allocate strings manually as described in above post.

Collectives™ on Stack Overflow

Delphi: storing data in classes vs records, memory usage reduction

4 Answers 4

3 Comments

4 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

4 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related