3

Let's say we have a system that executes 'jobs'. These jobs can be wired up into sequences we call 'integrations' which are effectively an ordered set of jobs.

The classic solution would be a join table:

integrations: id, name
jobs: id, name, commands
integrations_jobs: id, integration_id, job_id, integration_order

where integration_order is the position of that integration_job within the integration.

What if instead we just used a postgres array?

integrations: id, name, [job_id1, job_id2, job_id3]
jobs: id, name, commands

Are there any obvious drawbacks I'm missing with using an array instead of a join table? We are using Rails 5 and Postgres on Heroku.

3
  • Postgres does not currently allow array columns as foreign keys which means that you will lose referential integrity. Also ActiveRecord is not built to use array columns for associations which means you will be fighting the framework. If you flip the argument - what advantage does an array column give you in this case? Commented Jul 12, 2017 at 18:12
  • Thanks for your help max. We thought it would be simpler to version the integration. Tracking the join object in a different table and recording the history of the order of each of those join objects is more complicated then having a history of arrays. We would be fighting the framework a bit by writing custom rails validations to maintain integrity, but that seemed simpler than versioning the join table. Commented Jul 13, 2017 at 3:09
  • You can simply use a has_many through: association and use Papertrail to version the "join model". Commented Jul 15, 2017 at 7:34

2 Answers 2

0

You can use Postgres array or JSON or enum field but it will affect performance.
Look here

Sign up to request clarification or add additional context in comments.

1 Comment

wouldn't performance increase since we don't have to do joins?
0

The first drawback is you can not query to find a job in integrations, example:

jobs table
id
1
2

integrations table
id, name, job_ids
1, i1, [1,2]
2, i2, [2]

so, if you want to find integrations of a job which has id 2, you need to scan integrations table and check if job_ids has value of 2. If integrations table has a lot of records, it's a big perfomance problem.

If you have a join table, that is just a very simple query and faster. I don't see what wrong if you use a join table.

1 Comment

Thanh, I am pretty sure you can index array fields. postgresql.org/docs/9.5/static/gin-builtin-opclasses.html Also isn't this use case what the ANY / ALL and @> / <@ operators are for?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.