I have a table with a list of URL in stored in PG and I have an application which is supposed to take a subset of available URLs and process them. This application will update the processed field=true when it retrieves the set of URLs. I am going to have multiple instances running on different machines. How do I ensure my access to PG is exclusive, so I don't end up with same set of URLs on different machines?
2 Answers
You can lock rows in multiple ways, in postgresql: table-level locks, row-level locks, and advisory locks.
http://www.postgresql.org/docs/9.0/static/explicit-locking.html
In your case, however, locking is not going to be enough if you want good concurrency:
update links
set processing = true
where id in (
select id
from links
where not processed
and not processing
limit 100
for update
)
returning *
The extra processing field makes it possible to have multiple jobs to work on different sets of rows.
Comments
This problem can be solved purely in your postgresql queries using the LOCK statement. Here is a link to the documentation with examples: