4

I have quite a lot of data to store, read and modify in memory while the application works. The data could be compared to a tree, where each node is described by limited number of strings and integers, and has quite a lot of subelements. Currently the data is stored using classes/objects, like

TRootElement = class
  fName, fDescription: string;
  fPos: integer;
  /// etc
end;

fDocs: TObjectList; //list of TVariable = class(TRootElement)
fClasses: TObjectList; // list of TClass=class(TRootElement)

currently the memory consumed by the program is unacceptable, thus I'm looking for the solution to limit it.

My question is: will the consumption be significantly reduced if I replace current, OOP and Objects based architecture with one based on records? For example, the general record could contain:

TRootElement = record
  fType: TElemType; // enum: root, variable, class, etc ... 
  fName, fDesc: string; 
  // all the fields used by root elem and it's descendants there
end;

Should I replace TList with pointers to next / previous elements? Since I'm never accessing the elements of list by index, I'm always looping through the whole list, it shouldn't be really hard to do... however I'd like to avoid it if not necessary.

Thanks! m.

4
  • Which version of Delphi could help. Commented Dec 31, 2009 at 1:30
  • If you show us a portion of the data in your 1.8 megabyte base text file, then maybe we can give you some better answers. Commented Dec 31, 2009 at 17:44
  • Reading down it seems this application contains a "code completion" parse tree. I would consider storing such things in a memory-mapped file, which can be quickly accessed (loaded) and dropped, freeing memory without having to parse again. I think a B+Tree file stored on disk would be fast enough to replace your ram based system completely. Commented Dec 31, 2009 at 18:21
  • Memory you won't really save a lot. However, there are some things to execute in the constructor of an object. So, from CPU point of view, you might save a few CPU cycles (not much) if you use records instead of objects. Commented Apr 14, 2022 at 15:34

4 Answers 4

15

Changing a class into a record will reduce memory usage, but the significance of the savings decreases as the number of fields in the class or record increases. The size difference between a class and the corresponding record is exactly four bytes, which accounts for the VMT pointer that a class holds but which is absent from a record. That difference is usually negligible when you consider the trade-off: To save four bytes, you give up inheritance, polymorphism, data-hiding, and other object-oriented features. (Some of that might be mitigated with Delphi's new "records with methods," but if you only have Delphi 2005, you don't have that feature yet.)

In fact, if those four bytes really make the difference for your program, then you probably have a bigger problem to solve. That four-byte savings is wiped away simply by adding another node to your tree. With a large enough dataset, it won't matter how small you make any one node since you won't be able to keep them all in memory anyway. You'd need to investigate some kind of caching scheme, so only some nodes are kept in memory and the rest are kept elsewhere, such as in a file or a database.

If you replace your current lists with doubly linked lists of nodes, you'll probably see your memory use increase because now each of your nodes is keeping track of its next and previous neighbors whereas before the TObjectList was managing all that itself.

Sign up to request clarification or add additional context in comments.

3 Comments

I don't get it how is that possible that instance of TObjectList is smaller then n_reocords * 2 * sizeof(pointer)... what about all the fields inherited by TObjectList from it's parent class?
TObject is 4 bytes. TList adds 12 bytes, and TObjectList adds 4 more for a total empty TObjectList size of 20 bytes. Internally, TList uses an array of pointers to its elements (an additional 4 bytes per item), so the formula for total memory usage is y = 4x + 20. With a doubly linked list, you use two pointers (8 bytes) for every node in list object, so the formula is y = 8x. Although it starts out lower, its slope is twice that of the TList version, so it grows twice as fast. The intersection point, where you break even, is just 5.
Considering for how FasMM works (the smallest allocation unit), I am not even sure if you can really save those 4 byes :)
2

currently the memory consumed by the program is unacceptable

What is the meaning of unacceptable? Have you mesure it? What are the facts (number of objets, size of objects, used memory)?

Have you check with FastMM if your programm has a memory leak? If not this is the first think you should do.

If your lists often growing then perhaps you have a problem with memory fragmentation. Use the capacity property of the list (if possible). In this situation a linked list can help but a linked list needs more memory than TList (if capacity is used sensible). See How to monitor or visualize memory fragmentation of a delphi application for more Information how to check it.

For Delphi <= 2005 it can be helpfull to replace the Borland Memory Manager with FastMM it is so easy todo.

At least, like Rob, I don't think that a change to records solve your problems.

4 Comments

the application is web IDE, and the module is PHP code completion. Unnaceptable means that enabling the module & loading data for PHP5 (functions, classes, consts with descriptions (documentations)) increases memory usage by 10 Mb. Now imagine what happens if user decides to load some framework definitions. I'm using FastMM in fulldebugmode, definitely no leaks there.
If the data is fairly static, a common CGI trick is to put it in an serivice that shares it over shared mem and share it with all cgi instances, and post mutations back to the service over some IPC means.
10 megabytes is nothing in a machine with 2 gigabytes of RAM.
10 megabytes is not a problem, but that is only for php5. If user wants to have a code completion for framework or his own code, the usage will increment drastically.
2

Compared to the vast majority of IDEs, a memory increase for loading all of the PHP5 meta data of only 10 megabyte is actually pretty good.

If it is really worth the effort for you, I'd start with string literal merging.

Make all the strings go in a global table (or dictionary), and point to that from all your strings.

You can take this a step further, since the PHP 5 language and libraries are pretty static: convert your whole data structure from dynamic into static constants (using records), and all the indexes enumeration types.

What you might be able to do is make all your strings resourcestrings or string constants, and see if the Delphi compiler can do the literal merging for you.

I just noticed that you also load the documentation for all of the PHP5 stuff. That accounts for quite a bit of memory. You might want to load those into compressed streams.

1 Comment

not the full documentation in fact, just a brief description + description of parameters. This data is being retrieved quite often, on each code completion execution, thus the access time is also very important. However its memory representation is too big, comparing to a plain file size (text file with classes/functions names and descriptions) which is around 1.8 Mb.
1

If you can set limits on strings as Kornel says, it really matters. ansistring has some internal overhead, and the additional overhead too. However shortstring is always allocated, even when not used.

If you are really tight on memory, doing your own allocation for the strings is more sensible, specially if the data is relatively immutable. Then simply allocate a big block, and put all strings in there, prefixed with a 16-bit length or so.

Less lowlevel tricks, like simply deduping (some of the) strings saves a lot of storage space too.

Note that the record vs class discussion of Rob only goes if you manage to statically instantiate the class in memory you allocate very cheaply, which you probably don't do. This because you can use array of record. Otherwise the fact that it is always a reference type causes heapoverhead and -slack (fastmm, 16 byte granularity)

I would recommend against using tstringlist/tlist/tobjectlist, because insertion deletion in very large lists (millions) can be painfull because the deletion/insertion is O(n), and inserting in the middle means shifting half the data. This gets painful somewhere between 20-100k and 1M elements, depending on how your access pattern is.

Using a tlist of tlists, and not letting every tlist get too big is already a good workaround.

When I did this (for an OLAP cluster when 2GB server memory was still $2000), I at one point even used the alignment bits in pointers to store the size-class of allocations. I wouldn't recommend that though :-)

Of course going 64-bit with FPC is also an option. I've got the core server part of above 32-bit solution working in 64-bit in under an hour.

2 Comments

well, the memory I'm managing is not that big, definitely. That is just unacceptable comparing to what I'm actually storing.
I saw meanwhile in the comment. Just make it records, and allocate strings manually as described in above post.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.