3

I have a mongodb database where multiple node processes read and write documents. I would like to know how can I make that so only one process can work on a document at a time. (Some sort of locking) that is freed after the process finished updating that entry.

My application should do the following:

  • Walk through each entry one by one with a cursor.
  • (Lock the entry so no other processes can work with it)

  • Fetch information from a thirdparty site.

  • Calculate new information and update the entry.
  • (Unlock the document)

Also after unlocking the document there will be no need for other processes to update it for a few hours.

Later on I would like to set up multiple mongodb clusters so that I can reduce the load on the databases. So the solution should apply to both single and multiple database servers. Or at least using multiple mongo servers.

5
  • You will need to implement pessimisting locking on application layer. Commented May 26, 2017 at 0:41
  • Can you elaborate on zthe subject please and post an answer? Commented May 26, 2017 at 4:45
  • Sorry, there is nothing to elaborate. Mongodb does not support transactions, so you need to implement it yourself, or use one of npm packages. The former is too broad for StackOverflow format, and the later is quite opinionated, since there are quite a few of them. If you try one of the packages, and face some coding problems, it would be more suitable question. Commented May 26, 2017 at 8:07
  • Thanks thought theres an alternative if not transactions to make my node apps not to work on the same tasks some how. Cause of that i dont see why my question is too broad :) in one sentence i was just asking : how to make my node processes not to work on the same tasks that are stored in mongodb as documents. Commented May 26, 2017 at 10:13
  • App level locking is the alternative Commented May 26, 2017 at 10:15

3 Answers 3

5

An elegant solution that doesn't involve locks is:

  • Add a version property to your document.

  • When updating the document, increment the version property.

  • When updating the document, include the last read version in the find query. If your document has been updated elsewhere, the find query will yield no results and your update will fail.

  • If your update fails, you can retry the operation.

I have used this pattern with great success in the past.

Example

Imagine you have a document {_id: 123, version: 1}.

Imagine now you have 3 Mongo clients doing db.collection.findAndModify({ query: {_id: 123, version: 1}, update: { $inc: 1 }});, concurrently.

The first update will apply, the remaining will fail. Why? because now version is 2, and the query included version: 1.

Sign up to request clarification or add additional context in comments.

7 Comments

How does this work on multiple database servers that are set up for replication? How is this different than just setting a boolean value for each document? (bool WorkingOnIt;) then it gets replicated to a second database server, but the other process is reading from this server and it is already started working on the task while it shouldn't no?
I don't see who and why upvoted the answer please elaborate why is this a solution and how it works on replicated servers
With this, the first update to succeed will invalidate the rest. Let me add it to the response.
Replication is transparent to the clients so it's not relevant. Replication propagates the database journal across servers. The only thing you need to worry about is at which level mongo applies a lock. In this case as of the latest mongo version, mongo applies a lock at the document level. If you provide a query like this, then concurrent clients would not be able to overwrite changes with each other.
Yes but as i said i have a vps in new york and one in hong kong and i would like to apply the lock on both database servers. So those two vps servers wont perform the same task at any chance.
|
1

Per MongoDB documentation: isolated: Prevents a write operation that affects multiple documents from yielding to other reads or writes once the first document is written... $isolated operator causes write operations to acquire an exclusive lock on the collection...will make WiredTiger single-threaded for the duration of the operation. So if you are updating multiple documents, you could first get the data from the third-party API, parse the info into an array for example, and then use something like this in Mongo shell:

db.foo.update(
    { status : "A" , $isolated : 1 },
    { $set: { < your key >: < your info >}}, //use the info in your array
    { multi: true }
)

Or if you have to update the document one by one, you could use findAndModify() or updateOne() of the Node Driver for MongoDB. Please note that per MongoDB documentation 'When modifying a single document, both findAndModify() and the update() method atomically update the document.'

An example of updating one by one: first you connect to the Mongod with the NodeJS driver, then connect to the third part API using NodeJS's Request module, for example, get and parse the data, before using the data to modify your documents, something like below:

var request = require('request');

var MongoClient = require('mongodb').MongoClient,
    test = require('assert');

MongoClient.connect('mongodb://localhost:27017/test', function(err, db) {
    var collection = db.collection('simple_query');
    collection.find().forEach(
        function(doc) {
            request('http://www.google.com', function(error, response, body) {
                console.log('body:', body); // parse body for your info
                collection.findAndModify({
                    <query based on your doc>
                }, {
                    $set: { < your key >: < your info >
                    }
                })
            });
        }, function(err) {
        });
    });

3 Comments

Thanks so far i dont really understand how this works. Right now i managed to assign the tasks one by one for the processes from a thirdparty app (web app). Also optimized the apps a lot in the meantime waiting for answer and looks like scaling be necessary for a while. Anyways can you elaborate on the answer a bit more thanks.
Maybe present an example with a simple find & update? Since thats what actually my app is doing.
Example added. If you provide details on your documents and API, the example can be more concrete.
1

Encountered this question today, I feel like it's been left open,

First, findAndModify really seems like the way to go about it, But, I found vulnerabilities in both answers suggested:

in Treefish Zhang's answer - if you run multiple processes in parallel they will query the same documents because in the beginning you use "find" and not "findAndModify", you use "findAndModify" only after the process was done - during processing it's still not updated and other processes can query it as well.

in arboreal84's answer - what happens if the process crashes in the middle of handling the entry? if you update the version while querying, then the process crashes, you have no clue whether the operation succeeded or not.

therefore, I think the most reliable approach would be to have multiple fields:

  • version
  • locked:[true/false],
  • lockedAt:[timestamp] (optional - in case the process crashed and was not able to unlock, you may want to retry after x amount of time)
  • attempts:0 (optional - increment this if you want to know how many process attempts were done, good to count retries)

then, for your code:

  1. findAndModify: where version=oldVersion and locked=false, set locked=true, lockedAt=now
  2. process the entry
  3. if process succeeded, set locked=false, version=newVersion
  4. if process failed, set locked=false
  5. optional: for retry after ttl you can also query by "or locked=true and lockedAt<now-ttl"

and about:

i have a vps in new york and one in hong kong and i would like to apply the lock on both database servers. So those two vps servers wont perform the same task at any chance.

I think the answer to this depends on why you need 2 database servers and why they have the same entries,

if one of them is a secondary in cross-region replicas for high availability, findAndModify will query the primary since writing to secondary replica is not allowed and that's why you dont need to worry about 2 servers being in sync (it might have latency issue tho, but you'll have it anyways since you're communicating between 2 regions).

if you want it just for sharding and horizontal scaling, no need to worry about it because each shard will hold different entries, therefore entry lock is relevant just for one shard.

Hope it will help someone in the future

relevant questions:

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.