2

I'm using Postgres 12.2.

I try to fill a kind of event table where I have one column which needs to get the id of something.

In the example I created, the events are stored in the purchases table while the id needs to be the id of the company, taken from the companies table. As there is no fixed set of companies, the companies table needs to "grow as I go" ;).

So the setup is:

CREATE TABLE 
    companies 
    ( 
        company_id  SERIAL NOT NULL, 
        NAME        CHARACTER VARYING(256) NOT NULL, 
        PRIMARY KEY (company_id), 
        CONSTRAINT companies_unique_name UNIQUE (NAME) 
    )
;
CREATE TABLE 
    purchases
    ( 
        purchase_id    SERIAL NOT NULL, 
        purchase_date  DATE DEFAULT CURRENT_DATE NOT NULL, 
        amount         INTEGER NOT NULL,
        company_id     INTEGER NOT NULL, 
        PRIMARY KEY (purchase_id) 
    )
;

So when I now want to insert the event "Purchased 5 items from 'company1'", I need to get the id of "company1" by either looking it up in the companies table or by creating a new entry for "company1" in companies.

I can do it like this:

WITH EXISTING_COMPANY AS (
        SELECT company_id FROM companies WHERE name = 'company1'
)
   , NEW_COMPANY AS (
        INSERT INTO companies (name) VALUES ('company1')
        ON CONFLICT(name) DO NOTHING
        RETURNING company_id
)
   , GET_COMPANY_ID AS (
        SELECT COALESCE(
            (SELECT company_id FROM EXISTING_COMPANY),
            (SELECT company_id FROM NEW_COMPANY)
   ) AS company_id
)
INSERT INTO purchases(amount, company_id)
VALUES (5, (select company_id from GET_COMPANY_ID))
;

The CTE EXISTING_COMPANY will give me the id of an existing "company1" or null.

The CTE NEW_COMPANY will give me the id of a newly created company "company1" or null

The CTE GET_COMPANY_ID finally will, by using coalesce, try to get an existing id and, if that fails, a new id.

While this works it has the disadvantages that I need to give the company name twice and that I need new CTEs for each company as I do not know how to pass the company name to my CTEs.

  • Is there a way how to pass the company name to my CTEs?
  • Are there other ways to achieve my goal?

2 Answers 2

3

You can provide the new name with a values() clause in another CTE. You also don't really need the existing_company CTE as that can also be done inside the get_company cte:

WITH input(name) as (
  values ('company1')
), new_company AS (

  INSERT INTO companies (name) 
  select i.name 
  from input i
  ON CONFLICT (name) DO NOTHING
  RETURNING company_id

), get_company_id AS (
  select company_id
  from new_company

  union all

  select company_id
  from companies
  where name in (select name from input)
    and not exists (select * from new_company)
)
INSERT INTO purchases(amount, company_id)
select 5, company_id 
from get_company_id
;

This can also be extended to handle multiple companies and amounts:

WITH input(amount, name) as (
  values 
     (5, 'company1'), 
     (6, 'company2')
), new_company AS (
  INSERT INTO companies (name) 
  select name 
  from input
  ON CONFLICT (name) DO NOTHING
  RETURNING company_id, name
), get_company_id AS (

  select company_id, name
  from new_company
  union all

  select c.company_id, c.name
  from companies c
  where c.name in (select i.name from input i)
  and not exists (select * 
                  from new_company nc
                  where nc.company_id = c.company_id)
)
INSERT INTO purchases(amount, company_id)
select i.amount, g.company_id 
from get_company_id g
  join input i on i.name = g.name
;
Sign up to request clarification or add additional context in comments.

4 Comments

Correct me if I'm wrong but doesn't this try an INSERT first and then a SELECT? As I expect to have more "reoccurring" companies than new ones, wouldn't it be more efficient the other way around? That's why I used coalesce as it doesn't evaluate the INSERT when the SELECT succeeded.
The INSERT is always executed, regardless whether you select from its corresponding CTE or not.
I double checked that by adding a column to the companies table and changing its value in the ON CONFLICT. You are right. Then it seems I do not understand (cite) "The COALESCE function evaluates arguments from left to right until it finds the first non-null argument. All the remaining arguments from the first non-null argument are not evaluated." found here: postgresqltutorial.com/postgresql-coalesce
@Skeeve: actually it's because of "Data-modifying statements in WITH are executed exactly once, and always to completion, independently of whether the primary query reads all (or indeed any) of their output" (see postgresql.org/docs/current/…)
0

I looked into functions and I'm experimenting with this:

CREATE OR REPLACE FUNCTION companyID (IN company_name VARCHAR(256))
RETURNS integer AS
'
DECLARE
        cid integer;
BEGIN
        SELECT company_id INTO cid FROM companies WHERE name = company_name;
        IF cid IS NULL THEN
                INSERT INTO companies (name) VALUES(company_name)
                ON CONFLICT(name) DO NOTHING
                RETURNING company_id INTO cid;
        END IF;
        -- In case of a concurrent insert, cid could be null
        IF cid IS NULL THEN
                SELECT company_id INTO cid FROM companies WHERE name = company_name;
        END IF;
        RETURN cid;
END;
'
LANGUAGE plpgsql
;

The insert then works like this:

INSERT INTO purchases (amount, company_id) VALUES
(10, companyID('company2')),
(11, companyID('company1')),
(42, companyID('company2'))
;

Are there any disadvantages in this approach?

2 Comments

This is not safe for concurrent inserts
Thanks. I made an attempt to resolve it by ignoring insert conflicts and retrieving the concurrently inserted id in case of such a conflict. Does that look better? Any other disadvantages?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.