0

This is a little bit tricky to explain, but I'll give it a try:

In a node.js server application I would like to deal with data objects that can be used in more than one place at once. The main problem is, that these objects are only referred to by an object id and are loaded from the database.

However, as soon as an object is already loaded into one scope, it should not be loaded a second time when requested, but instead the same object should be returned.

This leads me to the question of garbage collection: As soon as an object is no longer needed in any scope, it should be released completely to prevent having the whole database in the server's memory all the time. But here starts the problem:

There are two ways I can think of to create such a scenario: Either use a global object reference (which prevents any object from being collected) or, really duplicate these objects but synchronize them in a way that each time a property in one scope gets changed, inform the other instances about that change.

Again, therefore each instance would have to register an event handler, which in turn is pointing back to that instance thus preventing it from being collected again.

Did anyone come up with a solution for such a scenario I just didn't realize? Or is there any misconception in my understanding of the garbage collector?

What I want to avoid is manual reference counting for every object in the memory. Everytime when an object is being removed from any collection, I would have to adapt the reference count manually (there is even no destructor or "reference decreased" event in js)

18
  • So… you want a cache. You can limit the number of items in the cache, discarding the least recently used ones. You can also remove items ahead of time if you know you won’t need them anymore, but that’s not something that can be done automatically. Commented Oct 30, 2017 at 23:53
  • A "cache" is not exactly the right concept, it should be that no object gets loaded twice, once it is already in the memory. In a cache with a limited size, this would not be guaranteed. Items that are removed from the cache (but possibly still referenced somewhere else), would be loaded twice. And of course I could keep track of any deletion myself, but I believe there must be a more elegant way, because as soon as you forget to decrease your reference counter when deleting an object somewhere, your memory leak is there. Exactly what a GC should avoid. Commented Oct 31, 2017 at 0:14
  • What determines when you won’t need to reference an object anymore, then? When it’s not present in any “collection”? What’s a collection? Commented Oct 31, 2017 at 0:25
  • As soon as an object is released from any scope it was used in, it could be collected (also the data) and being reloaded as soon as it is needed again somewhere. The problem is that I want to avoid different versions of the same object in memory, thus, only one instance. The second advantage is that every object using this data object would be aware of the latest updated version (because they all use the same object rather than an own copy of it). After there is no true parallel execution (or better: scheduling) in node.js, race conditions are not a concern Commented Oct 31, 2017 at 0:27
  • 1
    @Psi One of the most common use cases of Redis is as a cache between an application and a database. You would keep using your database exactly how you currently do. When you need an object, you get it from Redis. If it isn't there, you put the object in Redis and then use it. Not sure what you are trying to do, but Redis is frequently used as a way to share states between multiple application servers. Your use case sounds quite similar. Commented Oct 31, 2017 at 0:48

3 Answers 3

2

Using the weak module, I implemented a WeakMapObj that works like we originally wanted WeakMap to work. It allows you to use a primitive for the key and an object for the data and the data is retained with a weak reference. And, it automatically removes items from the map when their data is GCed. It turned out to be fairly simple.

const weak = require('weak');

class WeakMapObj {
    constructor(iterable) {
        this._map = new Map();
        if (iterable) {
            for (let array of iterable) {
                this.set(array[0], array[1]);
            }
        }
    }

    set(key, obj) {
        if (typeof obj === "object") {
            let ref = weak(obj, this.delete.bind(this, key));
            this._map.set(key, ref);
        } else {
            // not an object, can just use regular method
            this._map.set(key, obj);
        }
    }

    // get the actual object reference, not just the proxy
    get(key) {
        let obj = this._map.get(key);
        if (obj) {
            return weak.get(obj);
        } else {
            return obj;
        }
    }

    has(key) {
        return this._map.has(key);
    }

    clear() {
        return this._map.clear();
    }

    delete(key) {
        return this._map.delete(key);
    }
}

I was able to test it in a test app and confirm that it works as expected when the garbage collector runs. FYI, just making one or two objects eligible for garbage collection did not cause the garbage collector to run in my test app. I had to forcefully call the garbage collector to see the effect. I assume that would not be an issue in a real app. The GC will run when it needs to (which may only run when there's a reasonable amount of work to do).


You can use this more generic implementation as the core of your object cache where an item will stay in the WeakMapObj only until it is no longer referenced elsewhere.


Here's an implementation that keeps the map entirely private so it cannot be accessed from outside of the WeakMapObj methods.

const weak = require('weak');

function WeakMapObj(iterable) {
    // private instance data
    const map = new Map();

    this.set = function(key, obj) {
        if (typeof obj === "object") {
            // replace obj with a weak reference
            obj = weak(obj, this.delete.bind(this, key));
        }
        map.set(key, obj);

    }

    // add methods that have access to "private" map
    this.get = function(key) {
        let obj = map.get(key);
        if (obj) {
            obj = weak.get(obj);
        }
        return obj;
    }

    this.has = function(key) {
        return map.has(key);
    }

    this.clear = function() {
        return map.clear();
    }

    this.delete = function(key) {
        return map.delete(key);
    }

    // constructor implementation    
    if (iterable) {
        for (let array of iterable) {
            this.set(array[0], array[1]);
        }
    }
}
Sign up to request clarification or add additional context in comments.

21 Comments

Very nice! Much more elegant than dealing with manual refcounting.
@Psi - Yeah, that weak module is pretty cool and it seems to work just great in node.js. One wonders why the language itself doesn't have this type of weakMap. I see lots of reason to use something like this.
That was why I asked this question, I really could not believe that there is no built-in functionality to do this. For the GC: It only runs when some conditions are met (such as memory peaks, time), and therefore you would have to wait for about 10-20 minutes for the GC to kick in. Forcing the collection is a better way of testing. However, I don't recommend to use the GC in production code, but for testing purposes it is quite useful
@Psi - I made an update to this implementation to override .get() so that it fetches the actual object, not just the weak proxy. This, I think, is required for use because now when someone calls cache.get(id), it returns a live reference to the actual object (which keeps it from being GCed while in use). Without this override, it just returned the weak proxy which did not keep the original from being GCed while you were trying to use its proxy which could easily cause problems. A full implementation would also need to do this same override on .entries(), .forEach(). and .values().
Right, didn't think about that but it's obvious
|
2

Sounds like a job for a Map object used as a cache storing the object as the value (along with a count) and the ID as the key. When you want an object, you first look up its ID in the Map. If it's found there, you use the returned object (which will be shared by all). If it's not found there, you fetch it from the database and insert it into the Map (for others to find).

Then, to make it so that the Map doesn't grow forever, the code that fetches something from the Map would also need to release an object from the Map. When the useCnt goes to zero upon a release, you would remove an object from the Map.

This can be made entirely transparent to the caller by creating some sort of cache object that contains the Map and has methods for getting an object or releasing an object and it would be entirely responsible for maintaining the refCnt on each object in the Map.

Note: you will likely have to write the code that fetches it from the DB and inserts it into the Map carefully in order to not create a race condition because the fetching form the database is likely asynchronous and you could get multiple callers all not finding it in the Map and all in the process of getting it from the database. How to avoid that race condition depends upon the exact database you have and how you're using it. One possibility is for the first caller to insert a place holder in the Map so subsequent callers will know to wait for some promise to resolve before the object is inserted in the Map and available to them to use.

Here's a general idea for how such an ObjCache could work. You call cache.get(id) when you want to retrieve an item. This always returns a promise that resolves to the object (or rejects if there's an error getting it from the DB). If the object is in the cache already, the promise it returns will be already resolved. If the object is not in the cache yet, the promise will resolve when it has been fetched from the DB. This works even when multiple parts of your code request an object that is "in the process" of being fetched from the DB. They all get the same promise that is resolved with the same object when the object has been retrieved from the DB. Every call to cache.get(id) increases the refCnt for that object in the cache.

You then call cache.release(id) when a given piece of code is done with an object. That will decrement the internal refCnt and remove the object from the cache if the refCnt hits zero.

class ObjCache() {
    constructor() {
        this.cache = new Map();
    }
    get(id) {
        let cacheItem = this.cache.get(id);
        if (cacheItem) {
            ++cacheItem.refCnt;
            if (cacheItem.obj) {
                // already have the object
                return Promise.resolve(cacheItem.obj);
            }
            else {
                // object is pending, return the promise
                return cacheItem.promise;
            }
        } else {
            // not in the cache yet
            let cacheItem = {refCnt: 1, promise: null, obj: null};
            let p = myDB.get(id).then(function(obj) {
                // replace placeholder promise with actual object
                cacheItem.obj = obj;
                cacheItem.promise = null;
                return obj;
            });
            // set placeholder as promise for others to find
            cacheItem.promise = p;
            this.cache.set(id, cacheItem);
            return p;

        }
    }
    release(id) {
        let cacheItem = this.cache.get(id);
        if (cacheItem) {
            if (--cacheItem.refCnt === 0) {
                this.cache.delete(id);
            }
        }
    }
}

15 Comments

I just read about WeakMaps and I believe, they are not exactly a solution for what I am trying to do. A WeakMap stores an object (weakly referenced) as the key and is able to remove the value when the object gets collected. However, this would be the same if I just stored the data in the object itself. Unfortunately, WeakMap does not work the other way 'round. Numbers or Strings are not allowed for a key in the WeakMap
@Psi - You're right. A weakMap isn't quite right for how you've describe your task. But, you could make a regular Map work for it (I've revised my answer as such), but you'd have to do manual refCnt ing as you fetch, use and release an object.
True, but now it is exactly what I wrote in my edit of the question: I need to do manual reference counting (i.e. every time an object gets released from any scope, I need to inform it about that happening). I hoped there was another way around this, because this is not why I am using javascript, to deal with manual reference counting again.
@Psi - I don't know any other way. I've tried to figure out a way to make a weakMap work for you, but I don't see how. I added to my answer an outline of an ObjCache() that handles race conditions. I assumed a place holder for your database operation to retrieve the object when it's not in the cache and used promises to handle the race condition.
Thank you very much, I really appreciate your effort. However, to know when to release an object I must be sure that no other reference is still alive. So every object referencing that object needs to be aware of when it gets removed.
|
1

Ok, for anyone who faces similar problems, I found a solution. jfriend00 pushed me towards this solution by mentioning WeakMaps which were not exactly the solution themselves, but pointed my focus on weak references.

There is an npm module simply called weak that will do the trick. It holds a weak reference to an object and safely returns an empty object once the object was garbage collected (thus, there is a way to identify a collected object).

So I created a class called WeakCache using a DataObject:

class DataObject{

    constructor( objectID ){
        this.objectID = objectID;
        this.dataLoaded = new Promise(function(resolve, reject){
            loadTheDataFromTheDatabase(function(data, error){ // some pseudo db call
                if (error)
                {
                    reject(error);
                    return;
                }

                resolve(data);
            }); 
        });
    }

    loadData(){
        return this.dataLoaded;
    }   
}

class WeakCache{

    constructor(){
        this.cache = {};
    }

    getDataObjectAsync( objectID, onObjectReceived ){
        if (this.cache[objectID] === undefined || this.cache[objectID].loadData === undefined){ // object was not cached yet or dereferenced, recreate it

            this.cache[objectID] = weak(new DataObject( objectID )function(){
                // Remove the reference from the cache when it got collected anyway
                delete this.cache[this.objectID];
            }.bind({cache:this, objectID:objectID});
        }

        this.cache[objectID].loadData().then(onObjectReceived);
    }

}

This class is still in progress but at least this is a way how it could work. The only downside to this (but this is true for all database-based data, pun alert!, therefore not such a big deal), is that all data access has to be asynchronous.

What will happen here, is that the cache at some point may hold an empty reference to every possible object id.

5 Comments

FYI, one simple way to deal with the empty references would be to just use a setInterval() timer that runs every hour or so and removes any elements that contain empty references. As long as that code to remove them is all synchronous, it won't cause any possible race condition. But, it is likely possible to use the weak callback to clean it up real-time too.
I was thinking in a similar way, but using the callback when an object gets freed, does help as well. You can bind the objectID to the callback function which then just removes the objectID which was the key in the lookup table.
I was just reading in the weak reference that you have to be very careful with the callback to not accidentally catch the object itself in scope of the callback because if you do, it will never be garbage collected since it's still technically "reachable" by live code. They recommend using high level scoped callback functions only. Your mention of using .bind() on a top level callback would be a nice way to capture the ID without any danger of capturing the object itself in scope.
Exactly. I am running my tests with an advanced implementation of what you see here, and what I currently can verify is that it is working perfectly. Luckily, using --expose_gc you can force the gc to cleanup, very good way to test the behavior
See the WeakMapObj class I put in a new answer. I also had to use --expose-gc to be able to run global.gc() to test it, and it does work. Pretty cool.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.