3

We are using the django-dbarray module to use arrays in postgres. I've been doing some research regarding postgres arrays, and some developers have said they wouldn't recommend using a postgres array storing over X values. Sometimes, this is ten, and I've heard as many as thirty. Is there any consensus on how many values can or should be stored in an array before performance starts to taper off?

For reference, the above DB is mainly a read only DB.

We are trying to decide where we should use intermediate tables and where we should use a postgres array.

One additional related question: When creating an index against a column in a table, where that column stores array values (let's say bigint []). I realize the values stored within the array would not be indexed, but only the array itself (I'm assuming this is something like a C pointer). How efficient is this compared to simply having an intermediate table?

We may need to create joins against the values or have some of the specific values in a where clause, and I am concerned some of the performance could degrade and we may be better off having an intermediate table whenever we may need to create a join.

Lastly, given that we are using dbarray, what is the efficiency of that vs simply using an intermediate table with the standard django ORM (assume no joins are where clauses from the above question)?

Thank you

1
  • 1
    I'm one of the developers that believe that X should never be greater than 1. Arrays and databases is generally bad mojo...they are no more then 'free text' fields, making queries and the sort perform really poorly, especially if you are trying to join on a particular value in an array. If you never intend on writing SQL, I guess it's usable as a basic data store. If you want to write SQL against it...my preference from what I see here is to move this into a name-value pair table that will allow for flexibility as to what fields you are storing in a table (so yes to intermediate table). Commented May 22, 2012 at 18:38

1 Answer 1

3

PostgreSQL supports GIN and GiST indexes over intarrays, which allows you to run the queries like this:

SELECT  *
FROM    mytable
WHERE   myarray @> ARRAY[1, 2]
-- returns arrays which contain 1 AND 2

or this:

SELECT  *
FROM    mytable
WHERE   myarray && ARRAY[1, 2]
-- returns arrays which contain 1 OR 2

efficiently.

The first query is somewhat hard to rewrite efficiently using normalized schema.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.