1

I'm currently PHP building a script that has to update statistics when it finishes it's purpose. The script is accessed by web browser and depending on the traffic it may be executed simultaneously. I have to guarantee the statistics are right.

To give you the picture let's say we have a table:

CREATE TABLE statistics(
  user_id      integer NOT NULL,
  date         integer NOT NULL, -- for unix time
  stat1        integer NOT NULL DEFAULT 0,
  stat2        integer NOT NULL DEFAULT 0,
  stat3        integer NOT NULL DEFAULT 0  -- and so on...
);

-- Let's insert some testing data for a couple of users and days...
-- Day one
INSERT INTO statistics(1, 1303520820, 1, 1, 1);
INSERT INTO statistics(2, 1303520820, 1, 1, 1);
-- Day two
INSERT INTO statistics(1, 1303603200, 1, 1, 1);
INSERT INTO statistics(2, 1303603200, 1, 1, 1);
-- Day three
INSERT INTO statistics(1, 1303689600, 1, 1, 1);
INSERT INTO statistics(2, 1303689600, 1, 1, 1);

Every day a new row is inserted in the table so we could have a daily, weekly, monthly, yearly statistics. I have to be sure only one row is inserted per user_id per day. Also whenever an UPDATE query is executed it would increment the columns stat1,stat2,stat3 appropriately.

This script is expected to have quite some traffic and I want to figure out how to make things work when the script is executed and has a couple of instances working simultaneously. What method/technique do you find best for such tasks ?

2
  • 1
    One row per day or one row per user_id per day? Whichever one you want to guarantee, primary key and unique are your friends. Commented Apr 22, 2011 at 22:33
  • One row per user_id per day. I'll correct that - sorry for the mistake :) Commented Apr 22, 2011 at 22:35

3 Answers 3

3

The simplest solution is to add a unique constraint

CREATE TABLE statistics(
  user_id      integer NOT NULL,
  date         integer NOT NULL, -- for unix time
  stat1        integer NOT NULL DEFAULT 0,
  stat2        integer NOT NULL DEFAULT 0,
  stat3        integer NOT NULL DEFAULT 0,  -- and so on...
  UNIQUE(user_id,date)
);

You should definitely do this regardless of what other measures you take.

Sign up to request clarification or add additional context in comments.

9 Comments

Yes I though about that. Most probably I'll add UNIQUE only to the date column. I can't add it to the user_id column as the table can have endless rows with the same user_id :)
On second though I can't add Unique constraint to the structure. See the example in the question. The user_id can be the same and the date can be the same. Every user gets a new record in the database for every day.
UNIQUE(user_id,date) means that every combination of user_id and date must be unique, it is NOT the same as UNIQUE(date),UNIQUE(user_id), i.e. there can only be 1 row per user per date, rather than 1 row per user and 1 row per date.
The OP will also need a trigger, I think. The trigger would determine whether to insert or update a row, and to implement the arithmetic.
@catcall, i would use a procedure not a trigger for that
|
1

You need a unique constraint on the pair user_id and date, as others have said.

In order to insert without doing arithmetic when the compound key (user_id, date) doesn't exist, and to update with arithmetic when the compound key does exist, you need to write some code. Informally, this is called an "upsert". There's more than one way.

PosgreSQL docs have an example of a function that implements this kind of requirement using exception handling. The problem with a function is that you can't force application code or database girls to use it every time without exception.

You can (I think) use the suppress_redundant_updates_trigger(). The advantage of triggers is that they can't be bypassed accidentally by application code or by database girls. I have not used this technique myself, so I can't comment further on it. This trigger is documented here.

You can also handle the upsert logic with a user-defined trigger.

Comments

1

Also you can add CHECK for date value to ensure that it a multiple of 1 day:

ALTER TABLE "statistics" ADD CONSTRAINT "1day_quantum" CHECK ("date" = ("date" / 86400)::INTEGER * 86400);

Then exception will be thrown if one try insert wrong value of date.

If date field type will be TIMESTAMP or TIMESTAMPTZ then CHECK is more complicated:

ALTER TABLE "statistics" ADD CONSTRAINT "1day_quantum" CHECK ("date" = TIMESTAMP 'epoch' + ((EXTRACT(EPOCH FROM "date") / 86400)::INTEGER * 86400) * INTERVAL '1 second');

By changing 86400 (seconds count) you can adjust constraint to various quantum: 900 for 15 mins for example.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.