6

I have a table named as 'games', which contains a column named as 'title', this column is unique, database used in PostgreSQL

I have a user input form that allows him to insert a new 'game' in 'games' table. The function that insert a new game checks if a previously entered 'game' with the same 'title' already exists, for this, I get the count of rows, with the same game 'title'.

I use transactions for this, the insert function at the start uses BEGIN, gets the row count, if row count is 0, inserts the new row and after process is completed, it COMMITS the changes.

The problem is that, there are chances that 2 games with the same title if submitted by the user at the same time, would be inserted twice, since I just get the count of rows to chk for duplicate records, and each of the transaction would be isolated from each other

I thought of locking the tables when getting the row count as:

LOCK TABLE games IN ACCESS EXCLUSIVE MODE;
SELECT count(id) FROM games WHERE games.title = 'new_game_title' 

Which would lock the table for reading too (which means the other transaction would have to wait, until the current one is completed successfully). This would solve the problem, which is what I suspect. Is there a better way around this (avoiding duplicate games with the same title)

5
  • Try changing the isolation levels for your transaction. Commented Jan 6, 2013 at 5:54
  • 6
    Why don't you use a unique constraint instead of trying to fight the race conditions yourself? Commented Jan 6, 2013 at 5:56
  • @muistooshort I could do that, but it would produce a error at user end Commented Jan 6, 2013 at 5:57
  • 7
    Then trap the error yourself. You're trying to avoid a simple bit of error handling using a fragile pile of kludges, save yourself some trouble and let the database manage the data and its constraints. Commented Jan 6, 2013 at 6:15
  • 3
    You have to trap errors anyway. There are a lot of things besides a constraint violation that can make an INSERT fail: memory error, connectivity problem, permissions, etc. Trap this one, too. Commented Jan 6, 2013 at 7:39

2 Answers 2

5

You should NOT need to lock your tables in this situation.

Instead, you can use one of the following approaches:

  • Define UNIQUE index for column that really must be unique. In this case, first transaction will succeed, and second will error out.
  • Define AFTER INSERT OR UPDATE OR DELETE trigger that will check your condition, and if it does not hold, it should RAISE error, which will abort offending transaction

In all these cases, your client code should be ready to properly handle possible failures (like failed transactions) that could be returned by executing your statements.

Sign up to request clarification or add additional context in comments.

8 Comments

+1, a unique constraint is the only sensible way to go. (I'm not really fond of the trigger solution though).
how about using somethinkg like SELECT count(id) FROM games WHERE games.title = 'new_game_title' FOR UPDATE
@Akash: You can't count nor lock for update, something that is not yet committed by the other process. A unique constraint is the only safe solution, as mentioned by the others. A unique constraint is made for this problem, so use it.
@Akash: don't use explicit locking. Use a unique constraint and catch the error. that way your application will be much more scalable and use much less resources on the DB server. As others have pointed out you have to implement error handling anyway.
"I plan to use FOR UPDATE along with UNIQUE constraint" Why? It makes your application slow and the FOR UPDATE doesn't add anything at all. "its next to impossible catching the right error" Why? Just read (catch) the error message and you're done. Very simple to implement.
|
3

Using the highest transaction isolation(Serializable) you can achieve something similar to your actual question. But be aware that this may fail ERROR: could not serialize access due to concurrent update

I do not agree with the constraint approach entirely. You should have a constraint to protect data integrity, but relying on the constraint forces you to identify not only what error occurred, but which constraint caused the error. The trouble is not catching the error as some have discussed but identifying what caused the error and providing a human readable reason for the failure. Depending on which language your application is written in, this can be next to impossible. eg: telling the user "Game title [foo] already exists" instead of "game must have a price" for a separate constraint.

There is a single statement alternative to your two stage approach:

INSERT INTO games ( [column1], ... )
SELECT [value1], ...
WHERE NOT EXISTS ( SELECT x FROM games as g2 WHERE games.title = g2.title );

I want to be clear with this... this is not an alternative to having a unique constraint (which requires extra data for the index). You must have one to protect your data from corruption.

5 Comments

I think this is erroneous; it would allow multiple inserts of a single game title due to a race condition because the INSERT happens when the inner SELECT finds zero rows therefore no lock is taken.
I understand that having a unique constraint will prevent multi same title inserts altogether, but the INSERT INTO..SELECT WHERE NOT EXISTS solution put forward has a race condition; it's is possible that it will insert a duplicate title and the caller will be none the wiser, so I fail to see the value of it? You say 'it will reliably fail to insert the row regardless of constraints' but that is not true as two concurrent inserts of the same title may both succeed.
It's possible I'd been doing far too much Oracle coding when I originally wrote this. But I note that descriptions of "Read Committed Isolation Level" changed in the manual between 8.3 and 8.4. Without digging out an archaic version of postgres I couldn't tell if the advice was always wrong or just out of date. I've modified my answer to correct.
I'm still not sure it's correct, but you may well know more than me on the subject. I find these things very hard to reason about just looking at SQL and not as you rightly point out referring to the docs but my understanding is that because the inner select doesn't in fact select anything then no lock is taken regardless of the isolation level and therefore this is not safe in a concurrent scenario. I am 99% certain that is the case on PG 9.6.3 as I have been working on exactly this today on that version.
There's no solution which will never error and just wait. I'm a bit disappointed with postgres for this. Other DBMS (Oracle and Mysql) will execute the above SQL atomically forcing other queries to wait. Other DBMS run the risk of deadlock so I guess it's a case of picking your poison.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.