MongoDB and Redis as cache layer architecture

Question

Suppose we have a social network app (using NodeJS, Express) and MongoDB as the primary database engine.

In most of API calls from clients (mobile app, web app, etc.) I don't want to make a complex query for each request. These sort of requests can be replied from cache layer, Redis for instance.

But my question is how/when should I update the cache layer, because all write operations are performed in MongoDB database, not the cache layer (Redis). What is the correct approach/architecture to address this problem?

generalhenry · Accepted Answer · 2014-06-30 23:54:26Z

18

It really depends on your needs, but here's a fairly common one:

on_get_request
  if data_in_redis
    serve_data_from _redis
  else
    get_data_from_mongo
    set_data_in_redis
    set_expire_in_redis
    serve_data_from_memory

The data will be a bit stale at times, but that's ok for most use cases. It works well in combination with some cache invalidation when important data is written:

on_important_data
  delete_invalid_redis_keys

But that all assumes low write, high read, and a stable set of queries.

What does your high load use case look like?

answered Jun 30, 2014 at 23:54

generalhenry

17.3k4 gold badges51 silver badges63 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Afshin Mehrabani Over a year ago

What about cron jobs to fill the caches?

generalhenry Over a year ago

Could come into play if a) the acceptable stale time is high and b) the read time is high. Which isn't uncommon, a good case would be a mongo map reduce, where results that are ten minutes old are good enough, cron the mongo map reduce every ten minutes an update redis.

Asya Kamsky · Accepted Answer · 2014-07-03 19:19:52Z

This is already implemented in reference architecture for MongoDB open source project called "Socialite" though it's in Java and not node.js so my answers are based on my experience stress and load-testing that code.

As you can see from its implementation of status feed, the feed has option fanoutOnWrite cache which will create a cache (limited size document) for active users, capping the number of most recent entries in the cache document (that number is configurable).

The key principles from that implementation is that content requirements are in fact different from timeline cache requirements, and the write to content database is first as that is the system of record for all content, then you update the cache (if it exists). This part can be done asynchronously, if desired. The update utilizes "capped arrays" aka update $slice functionality to atomically push a new value/content to the array and chop the oldest one off at the same time.

Don't create cache for a user if it doesn't already exist (if they never log in then you're wasting the effort). Optionally you can expire caches based on some TTL parameter.

When you go to read a cache for a user when they log in and it's not there, then fall back on "fanoutOnRead" (which is querying all content of users they follow) and then build their cache out of that result.

The Socialite project used MongoDB for all back end, but when benchmarking it we found that the timeline cache did not need to be replicated or persisted so its MongoDB servers were configured to be "in memory" only (no journal, no replication, no disk flushing) which is analogous to your Redis use. If you lose the cache, it will just get rebuilt from permanent content DB "on demand".

Afshin Mehrabani · Accepted Answer · 2015-09-10 09:23:42Z

5

+25

Idel approach is write back cache way. You can write mongodb first and then write to redis. It is most common way.

Another option is, You can write redis first and send async message by using redis (like Q) Some thread can consume the message and read it, write it to mongoDB.

First option is more easy to implement. Second option can support huge # of write transaction. As i know, mongodb lock problem is not solved yet (it has been fixed from global lock to db level lock) Second option can be considerable to reduce such a lock contention.

edited Sep 10, 2015 at 9:23

Afshin Mehrabani

35.3k33 gold badges146 silver badges209 bronze badges

answered Jun 29, 2014 at 16:00

Terry Cho

6186 silver badges16 bronze badges

1 Comment

Asya Kamsky Over a year ago

actual testing shows that lock is not an issue with inserting content as usually the disk IO will end up being maxed out first. As long as schema and indexing is done optimally, the latency of each insert can stay constant as the size of DB/indexes grows.

Andrew Hacking · Accepted Answer · 2014-07-05 11:38:50Z

2

Since your question is about architecture and starts with "Suppose..."

Any reason for selecting mongoDB?

With Postgres I get better performance than mongoDB and the best of relational and schemaless documents with Postgres json/jsonb support which is actually faster than mongoDB. With Postgres you get a RELIABLE battle hardened database which has excellent performance, scalability and most importantly allows you to sleep at night and enjoy your vacations.

You can also use postgres LISTEN/NOTIFY for real-time events so you can perform redis cache busting.

Here is an example of using postgres LISTEN/NOTIFY in nodejs: http://gonzalo123.com/2011/05/23/real-time-notifications-part-ii-now-with-node-js-and-socket-io/

Here are some comprehensive performance benchmarks for Postgres 9.4 as a schemaless/noSQL document store vs mongoDB:

http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2013.pdf

answered Jul 5, 2014 at 11:38

Andrew Hacking

6,37633 silver badges37 bronze badges

1 Comment

Kiran Subbaraman Over a year ago

you had my attention at "allows you to sleep at night and enjoy your vacations" :-). Thanks for the links too.

Community · Accepted Answer · 2017-05-23 12:01:46Z

0

It would take some serious pumping of data to make Redis a viable option for a cache layer over MongoDB, bearing in mind that MongoDB itself has a working set which is held in RAM; as such both can actually serve from memory if you know what your doing and plan your schema right.

Normally turning to Redis for caching is the objective of massive sites like craigslist ( http://www.slideshare.net/jzawodn/living-with-sql-and-nosql-at-craigslist-a-pragmatic-approach ), who, as you can see on slide 7 of that presentation use it for:

counters
blobs
queues

and more, but you can easily see how their memcached install could also be merged with it to include certain postings as well if MongoDB were their primary store instead of MySQL.

So that presentation in itself gives you a good idea of how others use Redis with MongoDB.

Basically it is normally used to house snapshots of data that would normally be a little too slow to get from the database.

Here is some related information which I will use to bump my answer a little: What is Redis and what do I use it for? . I strongly recommend you read that question as it will give you more of a sense of exactly what use case Redis is for and what caching it could do.

edited May 23, 2017 at 12:01

CommunityBot

11 silver badge

answered Jul 3, 2014 at 7:32

Sammaye

44k7 gold badges110 silver badges149 bronze badges

1 Comment

Asya Kamsky Over a year ago

don't forget that MongoDB is both replicating data, and flushing it to disk as journal and as data files, which will slow it down relative to a pure in-memory solution. MongoDB can be configured to be in memory only - see my answer about how "Socialite" project did it.

zenbeni · Accepted Answer · 2014-07-03 09:46:01Z

0

Do you need transactions and real-time writes? When somebody writes an update on mongo, is it absolutely needed that clients are notified immediately of the change (1 second / minute / day)?

Is your data really important that any write should not be lost? If yes, you can't write on redis first except with AOF (which is not the default mode on redis and is much slower). Transactions between mongo and redis won't be that easy to implement for instance.

If you write in redis first, you can use publish / subscribe to notify a redis client subscribed to update the value in mongo, but there is no guarantee that your data is safely transferred, be warned! However, that should be the fastest / most performant way to update all your clients that are all connected to redis.

Another way, you can define a polling with your acceptable interval for real-time between redis and mongo to update the cache with changes from mongo to redis (decoupling) without writing directly to redis from your code. You can use listeners ("triggers" in mongo) to do so or use a dirty check.

Finally, some have migrated from mongo + redis to couchbase like viber, maybe you should consider it an option? http://www.couchbase.com/viber

answered Jul 3, 2014 at 9:46

zenbeni

7,2533 gold badges34 silver badges64 bronze badges

1 Comment

Sammaye Over a year ago

I have only read a couple of line but MongoDB does not have triggers

Collectives™ on Stack Overflow

MongoDB and Redis as cache layer architecture

6 Answers 6

2 Comments

Comments

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

Comments

1 Comment

1 Comment

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related