3

Assume we have a table that describes the contents of a user's basket of fruit. This is essentially just a mapping of user -> list of fruit.

Periodically, we query a remote data source and get a new list for a particular user. Then, we try to wholly replace all of that user's current rows with a new set of rows. For example:

Before:

user     fruit        freshness
-------------------------------
...
47       'apple'      0.1
47       'pear'       9.5
47       'pear'       2.8
...

After:

user     fruit        freshness
-------------------------------
...
47       'apple'      93.9034
47       'banana'     0
...

Given a new set of rows, is there a way to do the replacement that is atomic in postgres?

The following does NOT work:

BEGIN
DELETE FROM basket WHERE user=47
INSERT INTO basket ...
COMMIT

If we have multiple transactions going on simultaneously, postgres will happily order commands like this (with no locking issues):

BEGIN
BEGIN
DELETE
DELETE
INSERT
INSERT  <---- data inserted twice here
COMMIT
COMMIT

This will result in twice as many rows as there should be.

A lot of answers to similar questions claim that the first DELETE will lock the rows in question and the second transaction will have to wait, but this isn't true (for whatever reason). Both transactions happily step on each other's toes here.

Notes:

  • Preferably, a solution would make it so that it's not possible for a reader to see the intermediate state (no rows), but at this point I could live without it.
  • Is my DB just structured wrong? Really I just want to be able to store a map of user -> fruit[]. I could just stuff this all in a single jsonb column, but then I can't run joins on the rows. And joining is nice :(
3
  • Which column (or columns) is (are) primary key column in this table ? Are there other unique constraints on this table? Commented Jul 22, 2017 at 23:32
  • There is no primary key (although you can imagine a serial integer primary key if you like). There are no uniqueness constraints. In my real db, I actually did have a uniqueness constraint, which is how I discovered the double-insert problem, since the duplicated data was usually similar enough that it would trigger a uniqueness failure. Commented Jul 22, 2017 at 23:38
  • Curious if this is just a case of the transaction isolation level being too low. This behavior is allowed under the READ COMMITTED isolation level, but not under SERIALIZABLE. Commented Sep 3, 2021 at 21:47

1 Answer 1

5

You can lock the user_id that you are going to modify. If you have a users table then select the corresponding row for update:

BEGIN;
SELECT user_id FROM users WHERE user_id = 47 FOR UPDATE;
DELETE FROM basket WHERE user_id = 47;
INSERT INTO basket ...
COMMIT;

Alternatively you can use advisory locks, e.g.:

BEGIN;
SELECT pg_advisory_lock('basket'::regclass::int, 47);
DELETE FROM basket WHERE user_id = 47;
INSERT INTO basket ...
SELECT pg_advisory_unlock('basket'::regclass::int, 47);
COMMIT;

Read about Explicit Locking in the documentation.

Sign up to request clarification or add additional context in comments.

2 Comments

I'm not sure the first example works. You're acquiring a "FOR UPDATE" lock on the "users" table but then deleting from the "basket" table. Shouldn't you be getting a lock on the "basket" table?
@mattyb - Everything is OK. When a row is locked for update in one transaction, another concurrent transaction that tries to do the same will wait until the first is completed. It doesn't matter that we're going to perform an operation on another table, as long as we stop other transactions from doing the same for a particular user_id. Read more in the docs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.