1

Whenever I read something about NoSQL distributed databases they mention the CAP theorem and that it means that in a partitioned system you can either have full consistency, full availability, or a little bit of both, but never both entirely.

What is not really clear to me is what type of consistency they are talking about:

  1. Is it consistency in data freshness, where some clients may get older data than others?
  2. Or is it consistency in the sense that transactions may complete only partially and this may bring the data in an inconsistent state?

The second interpretation sounds quite dangerous to me and not really acceptable. The first interpretation sounds acceptable but how can you prevent that a client that requests a set of data is not served with partly outdated data and partly fresh data?

How dangerous is it to only offer partial consistency and what are the possible negative effects?

1 Answer 1

2

Consistency in distributed databases is a huge problem, and it means both of your options: stale data in some places, and partially completed transactions. I'm not going to write an essay about it because it is a huge problem and the solutions are not easy. However, here are some key phrases.

Eventual Consistency is the solution to this, but implementing it sounds like a big job. The key to the implementation is Idempotent Messages. Lets say a complete transaction involves updating data on machines A, B, and C. How do you actually do that? You start sending messages around the place, and keep sending them until you receive an acknowledgement of receipt and successful processing. You may send the message to B twice either because B never got the message, or because B's ack never got received. If you sent it twice because you never got the ack, then B had better do the right thing when it gets it again (which may be to ignore it), and send you an ack so you stop bothering it.

This is a pretty good article, it looks like, and its from a NoSQL point of view. There are loads of links about Idempotent Messages hidden in any search engine, so I'll let you root around.

Final note: Pat Helland who worked on Distributed Databases for many years (at Microsoft and Google among other places) eventually came to the conclusion that consistency for Distributed DBs was impossible, and that you'd better settle for Eventual Consistency via Idempotent Messages.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, clear answer. From a practical point of view though: let's say you choose an existing nosql dbms such as cassandra which offers tunable consistency. Are there any measures to indicate the likelihood of getting consistency problems with certain paramaters? Also, are there any ways of structuring the datamodel in such a way that consistency problems are less likely to occur?
I can't answer as I don't use Cassandra (and very little experience with NoSQL). However, when you get distributed databases - of any sort, even if just text files - you will have consistency problems sooner or later as machines are not guaranteed 100% up-time, networks go out for short periods, routers or DNS get misconfigured, etc etc. Unless Cassandra has its own system of idempotent messaging, it will one day loose consistency.
PS By distributed, I mean that no one node has all the data; I don't include db replication.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.