Mongodb queries from multiple processes; how to implement atomicity?

Question

I have a mongodb database where multiple node processes read and write documents. I would like to know how can I make that so only one process can work on a document at a time. (Some sort of locking) that is freed after the process finished updating that entry.

My application should do the following:

Walk through each entry one by one with a cursor.
(Lock the entry so no other processes can work with it)
Fetch information from a thirdparty site.
Calculate new information and update the entry.
(Unlock the document)

Also after unlocking the document there will be no need for other processes to update it for a few hours.

Later on I would like to set up multiple mongodb clusters so that I can reduce the load on the databases. So the solution should apply to both single and multiple database servers. Or at least using multiple mongo servers.

You will need to implement pessimisting locking on application layer. — Alex Blex
– Alex Blex, Commented May 26, 2017 at 0:41
Can you elaborate on zthe subject please and post an answer? — Playdome.io
– Playdome.io, Commented May 26, 2017 at 4:45
Sorry, there is nothing to elaborate. Mongodb does not support transactions, so you need to implement it yourself, or use one of npm packages. The former is too broad for StackOverflow format, and the later is quite opinionated, since there are quite a few of them. If you try one of the packages, and face some coding problems, it would be more suitable question. — Alex Blex
– Alex Blex, Commented May 26, 2017 at 8:07
Thanks thought theres an alternative if not transactions to make my node apps not to work on the same tasks some how. Cause of that i dont see why my question is too broad :) in one sentence i was just asking : how to make my node processes not to work on the same tasks that are stored in mongodb as documents. — Playdome.io
– Playdome.io, Commented May 26, 2017 at 10:13

arboreal84 · Accepted Answer · 2017-05-30 22:16:50Z

5

An elegant solution that doesn't involve locks is:

Add a version property to your document.
When updating the document, increment the version property.
When updating the document, include the last read version in the find query. If your document has been updated elsewhere, the find query will yield no results and your update will fail.
If your update fails, you can retry the operation.

I have used this pattern with great success in the past.

Example

Imagine you have a document {_id: 123, version: 1}.

Imagine now you have 3 Mongo clients doing db.collection.findAndModify({ query: {_id: 123, version: 1}, update: { $inc: 1 }});, concurrently.

The first update will apply, the remaining will fail. Why? because now version is 2, and the query included version: 1.

edited May 30, 2017 at 22:16

answered May 30, 2017 at 19:00

arboreal84

2,15421 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Playdome.io Over a year ago

How does this work on multiple database servers that are set up for replication? How is this different than just setting a boolean value for each document? (bool WorkingOnIt;) then it gets replicated to a second database server, but the other process is reading from this server and it is already started working on the task while it shouldn't no?

Playdome.io Over a year ago

I don't see who and why upvoted the answer please elaborate why is this a solution and how it works on replicated servers

arboreal84 Over a year ago

With this, the first update to succeed will invalidate the rest. Let me add it to the response.

arboreal84 Over a year ago

Replication is transparent to the clients so it's not relevant. Replication propagates the database journal across servers. The only thing you need to worry about is at which level mongo applies a lock. In this case as of the latest mongo version, mongo applies a lock at the document level. If you provide a query like this, then concurrent clients would not be able to overwrite changes with each other.

Playdome.io Over a year ago

Yes but as i said i have a vps in new york and one in hong kong and i would like to apply the lock on both database servers. So those two vps servers wont perform the same task at any chance.

|

Treefish Zhang · Accepted Answer · 2017-05-30 19:53:57Z

1

Per MongoDB documentation: isolated: Prevents a write operation that affects multiple documents from yielding to other reads or writes once the first document is written... $isolated operator causes write operations to acquire an exclusive lock on the collection...will make WiredTiger single-threaded for the duration of the operation. So if you are updating multiple documents, you could first get the data from the third-party API, parse the info into an array for example, and then use something like this in Mongo shell:

db.foo.update(
    { status : "A" , $isolated : 1 },
    { $set: { < your key >: < your info >}}, //use the info in your array
    { multi: true }
)

Or if you have to update the document one by one, you could use findAndModify() or updateOne() of the Node Driver for MongoDB. Please note that per MongoDB documentation 'When modifying a single document, both findAndModify() and the update() method atomically update the document.'

An example of updating one by one: first you connect to the Mongod with the NodeJS driver, then connect to the third part API using NodeJS's Request module, for example, get and parse the data, before using the data to modify your documents, something like below:

var request = require('request');

var MongoClient = require('mongodb').MongoClient,
    test = require('assert');

MongoClient.connect('mongodb://localhost:27017/test', function(err, db) {
    var collection = db.collection('simple_query');
    collection.find().forEach(
        function(doc) {
            request('http://www.google.com', function(error, response, body) {
                console.log('body:', body); // parse body for your info
                collection.findAndModify({
                    <query based on your doc>
                }, {
                    $set: { < your key >: < your info >
                    }
                })
            });
        }, function(err) {
        });
    });

edited May 30, 2017 at 19:53

answered May 30, 2017 at 0:57

Treefish Zhang

1,1711 gold badge15 silver badges27 bronze badges

3 Comments

Playdome.io Over a year ago

Thanks so far i dont really understand how this works. Right now i managed to assign the tasks one by one for the processes from a thirdparty app (web app). Also optimized the apps a lot in the meantime waiting for answer and looks like scaling be necessary for a while. Anyways can you elaborate on the answer a bit more thanks.

Playdome.io Over a year ago

Maybe present an example with a simple find & update? Since thats what actually my app is doing.

Treefish Zhang Over a year ago

Example added. If you provide details on your documents and API, the example can be more concrete.

Rom Haviv · Accepted Answer · 2022-03-06 17:18:02Z

Encountered this question today, I feel like it's been left open,

First, findAndModify really seems like the way to go about it, But, I found vulnerabilities in both answers suggested:

in Treefish Zhang's answer - if you run multiple processes in parallel they will query the same documents because in the beginning you use "find" and not "findAndModify", you use "findAndModify" only after the process was done - during processing it's still not updated and other processes can query it as well.

in arboreal84's answer - what happens if the process crashes in the middle of handling the entry? if you update the version while querying, then the process crashes, you have no clue whether the operation succeeded or not.

therefore, I think the most reliable approach would be to have multiple fields:

version
locked:[true/false],
lockedAt:[timestamp] (optional - in case the process crashed and was not able to unlock, you may want to retry after x amount of time)
attempts:0 (optional - increment this if you want to know how many process attempts were done, good to count retries)

then, for your code:

findAndModify: where version=oldVersion and locked=false, set locked=true, lockedAt=now
process the entry
if process succeeded, set locked=false, version=newVersion
if process failed, set locked=false
optional: for retry after ttl you can also query by "or locked=true and lockedAt<now-ttl"

and about:

i have a vps in new york and one in hong kong and i would like to apply the lock on both database servers. So those two vps servers wont perform the same task at any chance.

I think the answer to this depends on why you need 2 database servers and why they have the same entries,

if one of them is a secondary in cross-region replicas for high availability, findAndModify will query the primary since writing to secondary replica is not allowed and that's why you dont need to worry about 2 servers being in sync (it might have latency issue tho, but you'll have it anyways since you're communicating between 2 regions).

if you want it just for sharding and horizontal scaling, no need to worry about it because each shard will hold different entries, therefore entry lock is relevant just for one shard.

Hope it will help someone in the future

relevant questions:

Collectives™ on Stack Overflow

Mongodb queries from multiple processes; how to implement atomicity?

3 Answers 3

Example

7 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Example

7 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related