1

I have a table like so in a Postgres DB -

id   dataset_id   entity_id   county_state_id   data
34   31           33413       341               JSOB object
35   31           33413       342               JSOB object
36   31           33413                         JSOB object

I want to insert or update this table based on if a record already exists in the table. I have written the following query to do so -

INSERT INTO entity (id, dataset_id, entity_id, county_state_id, data) 
SELECT
    nextval('id_seq'),
    (SELECT id FROM dataset WHERE name = 'Payer'),
    e.id,
    NULL
    jsonb_build_object
        ('a', a, 
        'b', b,
        'c', c,
        )
    from
entity e
JOIN payer p
ON p.id = e.id
ON CONFLICT (dataset_id, entity_id, data, county_state_id)
DO NOTHING;

I insert the following input into the table -

id   dataset_id   entity_id   county_state_id   data
37   31           33413                         JSOB object

I would expect the above SQL query to not update any records because this record already exists in the table. But it does insert a record. I suspect this is happening because NULL <> NULL and I am trying to insert a NULL into the county_state_id column. That is an integer column so I cannot insert an empty string into it, so I do not know how to get Postgres to recognize that the above record already exists in the table.

3
  • Yeah, but you could INSERT 0. Commented Dec 22, 2020 at 1:11
  • @AdrianKlaver Yeah but I don't want to insert a zero because there is no county_state_id that is a 0. Commented Dec 22, 2020 at 1:12
  • Then create dummy record that has a county_state_id of 0 eg. 'no_county_state_id`. Commented Dec 22, 2020 at 1:18

2 Answers 2

1

If you want to prevent duplicates, you need a unique index or constraint. For this purpose, you need two of them:

-- handle not-NULL case
alter table t add constraint unqc_entity_4 unique (dataset_id, entity_id, data, county_state_id);

alter table t add constraint unqc2_entity_4 unique (dataset_id, entity_id, data, (case when county_state_id is null then -1 else id end);

Happily, do nothing applies to all constraints if none are specified, so you can phrase the insert as:

INSERT . . .
ON CONFLICT DO NOTHING;

Here is a little db<>fiddle illustrating the concept.

Sign up to request clarification or add additional context in comments.

3 Comments

I have a unique index that is why I am able to use the ON CONFLICT clause.
@Aaron . . . NULLs are tricky in this context but there is a pretty simple work-around.
@GordonLinoff, I prefer to avoid using special values like -1 instead of NULL. Postgres supports filtered / partial indexes, which I think fit here pretty well. See my answer.
0

It looks like a filtered / partial unique index would be suitable here.

Actually, two indexes.

-- this takes care of non-null duplicates
CREATE UNIQUE INDEX IX_entity_NON_NULL ON entity 
(dataset_id, entity_id, county_state_id);


-- this prevents duplicates when county_state_id IS NULL
CREATE UNIQUE INDEX IX_entity_NULL ON entity 
(dataset_id, entity_id)
WHERE (county_state_id IS NULL);

With this approach you don't need to use some special values, like 0 or -1 instead of NULL values.

It is not clear for me from the question whether the data field should be included in the index, include it if necessary.

2 Comments

What would my insert or on conflict clause look like if I did it this way?
@Aaron, I think that your INSERT statement remains as it is. You would not be able to insert a second row with NULL county_state_id and repeating dataset_id, entity_id. The partial unique index would prevent it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.