12

Suppose we have a social network app (using NodeJS, Express) and MongoDB as the primary database engine.

In most of API calls from clients (mobile app, web app, etc.) I don't want to make a complex query for each request. These sort of requests can be replied from cache layer, Redis for instance.

But my question is how/when should I update the cache layer, because all write operations are performed in MongoDB database, not the cache layer (Redis). What is the correct approach/architecture to address this problem?

6 Answers 6

18

It really depends on your needs, but here's a fairly common one:

on_get_request
  if data_in_redis
    serve_data_from _redis
  else
    get_data_from_mongo
    set_data_in_redis
    set_expire_in_redis
    serve_data_from_memory

The data will be a bit stale at times, but that's ok for most use cases. It works well in combination with some cache invalidation when important data is written:

on_important_data
  delete_invalid_redis_keys

But that all assumes low write, high read, and a stable set of queries.

What does your high load use case look like?

Sign up to request clarification or add additional context in comments.

2 Comments

What about cron jobs to fill the caches?
Could come into play if a) the acceptable stale time is high and b) the read time is high. Which isn't uncommon, a good case would be a mongo map reduce, where results that are ten minutes old are good enough, cron the mongo map reduce every ten minutes an update redis.
5

This is already implemented in reference architecture for MongoDB open source project called "Socialite" though it's in Java and not node.js so my answers are based on my experience stress and load-testing that code.

As you can see from its implementation of status feed, the feed has option fanoutOnWrite cache which will create a cache (limited size document) for active users, capping the number of most recent entries in the cache document (that number is configurable).

The key principles from that implementation is that content requirements are in fact different from timeline cache requirements, and the write to content database is first as that is the system of record for all content, then you update the cache (if it exists). This part can be done asynchronously, if desired. The update utilizes "capped arrays" aka update $slice functionality to atomically push a new value/content to the array and chop the oldest one off at the same time.

Don't create cache for a user if it doesn't already exist (if they never log in then you're wasting the effort). Optionally you can expire caches based on some TTL parameter.

When you go to read a cache for a user when they log in and it's not there, then fall back on "fanoutOnRead" (which is querying all content of users they follow) and then build their cache out of that result.

The Socialite project used MongoDB for all back end, but when benchmarking it we found that the timeline cache did not need to be replicated or persisted so its MongoDB servers were configured to be "in memory" only (no journal, no replication, no disk flushing) which is analogous to your Redis use. If you lose the cache, it will just get rebuilt from permanent content DB "on demand".

Comments

5
+25

Idel approach is write back cache way. You can write mongodb first and then write to redis. It is most common way.

Another option is, You can write redis first and send async message by using redis (like Q) Some thread can consume the message and read it, write it to mongoDB.

First option is more easy to implement. Second option can support huge # of write transaction. As i know, mongodb lock problem is not solved yet (it has been fixed from global lock to db level lock) Second option can be considerable to reduce such a lock contention.

1 Comment

actual testing shows that lock is not an issue with inserting content as usually the disk IO will end up being maxed out first. As long as schema and indexing is done optimally, the latency of each insert can stay constant as the size of DB/indexes grows.
2

Since your question is about architecture and starts with "Suppose..."

Any reason for selecting mongoDB?

With Postgres I get better performance than mongoDB and the best of relational and schemaless documents with Postgres json/jsonb support which is actually faster than mongoDB. With Postgres you get a RELIABLE battle hardened database which has excellent performance, scalability and most importantly allows you to sleep at night and enjoy your vacations.

You can also use postgres LISTEN/NOTIFY for real-time events so you can perform redis cache busting.

Here is an example of using postgres LISTEN/NOTIFY in nodejs: http://gonzalo123.com/2011/05/23/real-time-notifications-part-ii-now-with-node-js-and-socket-io/

Here are some comprehensive performance benchmarks for Postgres 9.4 as a schemaless/noSQL document store vs mongoDB:

http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2013.pdf

1 Comment

you had my attention at "allows you to sleep at night and enjoy your vacations" :-). Thanks for the links too.
0

It would take some serious pumping of data to make Redis a viable option for a cache layer over MongoDB, bearing in mind that MongoDB itself has a working set which is held in RAM; as such both can actually serve from memory if you know what your doing and plan your schema right.

Normally turning to Redis for caching is the objective of massive sites like craigslist ( http://www.slideshare.net/jzawodn/living-with-sql-and-nosql-at-craigslist-a-pragmatic-approach ), who, as you can see on slide 7 of that presentation use it for:

  • counters
  • blobs
  • queues

and more, but you can easily see how their memcached install could also be merged with it to include certain postings as well if MongoDB were their primary store instead of MySQL.

So that presentation in itself gives you a good idea of how others use Redis with MongoDB.

Basically it is normally used to house snapshots of data that would normally be a little too slow to get from the database.

Here is some related information which I will use to bump my answer a little: What is Redis and what do I use it for? . I strongly recommend you read that question as it will give you more of a sense of exactly what use case Redis is for and what caching it could do.

1 Comment

don't forget that MongoDB is both replicating data, and flushing it to disk as journal and as data files, which will slow it down relative to a pure in-memory solution. MongoDB can be configured to be in memory only - see my answer about how "Socialite" project did it.
0

Do you need transactions and real-time writes? When somebody writes an update on mongo, is it absolutely needed that clients are notified immediately of the change (1 second / minute / day)?

Is your data really important that any write should not be lost? If yes, you can't write on redis first except with AOF (which is not the default mode on redis and is much slower). Transactions between mongo and redis won't be that easy to implement for instance.

If you write in redis first, you can use publish / subscribe to notify a redis client subscribed to update the value in mongo, but there is no guarantee that your data is safely transferred, be warned! However, that should be the fastest / most performant way to update all your clients that are all connected to redis.

Another way, you can define a polling with your acceptable interval for real-time between redis and mongo to update the cache with changes from mongo to redis (decoupling) without writing directly to redis from your code. You can use listeners ("triggers" in mongo) to do so or use a dirty check.

Finally, some have migrated from mongo + redis to couchbase like viber, maybe you should consider it an option? http://www.couchbase.com/viber

1 Comment

I have only read a couple of line but MongoDB does not have triggers

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.