Storing duplicate data as a column in Postgres?

Question

In some database project, I have a users table which somehow has a computed value avg_service_rating. And there is another table called services with all the services associated to the user and the ratings for that service. Is there a computationally-lite way which I can maintain the avg_service_rating rating without updating it every time an INSERT is done on the services table? Perhaps like a generate column but with a function call instead? Any direct advice or link to resources will be greatly appreciated as well!

CREATE TABLE users (
    username VARCHAR PRIMARY KEY,
    avg_service_ratings NUMERIC -- is it possible to store some function call for this column?,
    ...
);

CREATE TABLE service (
    username VARCHAR NOT NULL REFERENCE users (username);
    service_date DATE NOT NULL,
    rating INTEGER,
    PRIMARY KEY (username, service_date),
);

I have thought of using a view but I do not know how to justify the change from getting rid of the column to my group mates. So I figured there should be some way to store function call as a column or generate it dynamically. Furthermore, I think almost everyone wants to avoid a refactor on the current queries we have. — Ken Gondor
– Ken Gondor, Commented Oct 14, 2020 at 4:33

Laurenz Albe · Accepted Answer · 2020-10-14 06:35:32Z

If the values should be consistent, a generated column won't fit the bill, since it is only recomputed if the row itself is modified.

I see two solutions:

have a trigger on the services table that updates the users table whenever a rating is added or modified. That slows down data modifications, but not your queries.
Turn users into a view. The original users table would be renamed, and it loses the avg_service_rating column, which is computed on the fly by the view.

To make the illusion perfect, create an INSTEAD OF INSERT OR UPDATE OR DELETE trigger on the view that modifies the underlying table. Then your application does not need to be changed.

With this solution you pay a certain price both on SELECT and on data modifications, but the latter price will be lower, since you don't have to modify two tables (and users might receive fewer modifications than services). An added advantage is that you avoid data duplication.

Ian Barwick · Accepted Answer · 2020-10-14 06:14:41Z

1

A generated column would only be useful if the source data is in the same table row.

Otherwise your options are a view (where you could call a function or calculate the value via a subquery), or an AFTER UPDATE OR INSERT trigger on the service table, which updates users.avg_service_ratings. With a trigger, if you get a lot of updates on the service table you'd need to consider possible concurrency issues, but it would mean the figure doesn't need to be calculated every time a row in the users table is accessed.

answered Oct 14, 2020 at 6:14

Ian Barwick

1515 bronze badges

Collectives™ on Stack Overflow

Storing duplicate data as a column in Postgres?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related