Finding and creating missing rows in table

Question

Hello postgres experts,

I have an app where users can vote on a poll. My schema looks like this:

polls table:

id	name
1	Favorite fruit

options table:

id	poll_id	content
1	1	apple
2	1	orange
3	1	grape
4	1	banana

participants table:

id	poll_id	name
1	1	John
2	1	Jane

votes table:

id	poll_id	participant_id	option_id	type
1	1	1	1	yes
2	1	1	3	yes
3	1	2	2	yes

I made the poor choice of deciding to not create rows for "no" votes in the votes table thinking it would "save space". I realize now that it was not such a great idea because in the future I would like to know whether the user explicitly voted "no" or if perhaps the option was added after they voted and thus did not have the option to choose it. So I need to run a query that will fill all the missing "no" votes in the votes table for existing participants. The final result should look like this:

votes table:

id	poll_id	participant_id	option_id	type
1	1	1	1	yes
2	1	1	3	yes
3	1	2	2	yes
4	1	1	2	no
5	1	1	4	no
6	1	2	1	no
7	1	2	3	no
8	1	2	4	no

I have a dbfiddle with all the data already in it:

https://dbfiddle.uk/?rdbms=postgres_14&fiddle=7d0f4c83095638cc6006b1d7876d0e01

Side question: Should I be concerned about the size of the votes table in this schema? I expect it to quickly blow up to millions of rows. Is a schema where options are stored as an array in the polls table and votes stored in the participants table a better idea?

Thank you for your help.

What indices do you have on the tables? Especially UNIQUE ones — Bergi
– Bergi, Commented May 11, 2022 at 22:44
"Should I be concerned about the size of the votes table in this schema?" - this should probably be better asked as a separate question, maybe at Database Administrators. Regardless: in general, no, you should no be concerned about millions of rows. Storing arrays won't save that much space, and make querying much more complicated. Whether it's a "better idea" depends a lot on what are you doing with these tables and what exactly you are optimising for. — Bergi
– Bergi, Commented May 11, 2022 at 22:48

Bergi · Accepted Answer · 2022-05-12 12:24:15Z

1

You seem to be looking for a JOIN of participants with options, EXCEPT the rows that already are in votes. There are various ways to do that, but most straightforward:

INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT poll_id, participant_id, option_id, 'no'
FROM (
  SELECT o.poll_id, p.id, o.id
  FROM options o
  JOIN participants p ON o.poll_id = p.poll_id
EXCEPT
  SELECT poll_id, participant_id, option_id
  FROM votes
) AS missing;

Alternatively:

INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT o.poll_id, p.id, o.id, 'no'
FROM options o
JOIN participants p ON o.poll_id = p.poll_id
WHERE NOT EXISTS (
  SELECT *
  FROM votes
  WHERE poll_id = o.poll_id AND participant_id = p.id AND option_id = o.id
);

Or, assuming you already have UNIQUE index on votes, just

INSERT INTO votes(poll_id, participant_id, option_id, type)
SELECT o.poll_id, p.id, o.id, 'no'
FROM options o
ON CONFLICT ON CONSTRAINT votes_p_key
DO NOTHING;

edited May 12, 2022 at 12:24

answered May 11, 2022 at 22:57

Bergi

671k162 gold badges1k silver badges1.5k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Luke Vella Over a year ago

Thanks so much! I think you summed up what I was trying to do really well. I managed to get all 3 working with some minor adjustments ("no -> 'no', adding column aliases).

Collectives™ on Stack Overflow

Finding and creating missing rows in table

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related