4

I have a PostgreSQL database table with text[] (array) columns defined on it. I'm using these columns to search for a specific record in the database in this way:

select obj from business
where ((('street' = ANY (address_line_1)
    and 'a_city' = ANY (city)
    and 'a_state' = ANY (state))
or    ('street' = ANY (address_line_1)
    and '1234' = ANY (zip_code)))
and ('a_business_name' = ANY (business_name)
    or 'a_website' = ANY (website_url)
    or array['123'] && phone_numbers))

The problem I'm having is that with about 1 million records, the query gets really slow. My question is simple, do array columns have different types of indexes?. Does anybody know the best type of index to create in this case? (Assuming there are different types).

Just in case, this is the explain analyze response:

"Seq Scan on business  (cost=0.00..207254.51 rows=1 width=32) (actual time=18850.462..18850.462 rows=0 loops=1)"
"  Filter: (('a'::text = ANY (address_line_1)) AND (('a'::text = ANY (business_name)) OR ('a'::text = ANY (website_url)) OR ('{123}'::text[] && phone_numbers)) AND ((('a'::text = ANY (city)) AND ('a'::text = ANY (state))) OR ('1234'::text = ANY (zip_code))))"
"  Rows Removed by Filter: 900506"
"Total runtime: 18850.523 ms"

Thanks in advance!

1
  • As always: your version number of Postgres. The table definition and some sample data would also be very helpful, for instance as sqlfiddle. Example. Commented Jan 28, 2013 at 21:15

1 Answer 1

3

You can use a GIN index to effectively help performance with arrays.
Use it in combination with array operators.

For instance:

CREATE INDEX business_address_line_1_idx ON business USING GIN (address_line_1);

Do that for all array columns involved in conditions.

It might be worth considering to normalize your schema instead. Maybe splitting up the multiple entries into a separate (1:n or n:m) table would serve you better. It often does in the long run, even if it seems like more work at first.

Sign up to request clarification or add additional context in comments.

2 Comments

I up voted, especially regarding normalising the database. IMO, array fields should be avoided especially when used for selecting, like this
Thank you Erwin, it worked like a charm:"Seq Scan on business (cost=0.00..206323.42 rows=1 width=32) (actual time=1259.974..1259.974 rows=0 loops=1)" " Filter: (('a'::text = ANY (address_line_1)) AND (('a'::text = ANY (business_name)) OR ('a'::text = ANY (website_url)) OR ('{123}'::text[] && phone_numbers)) AND ((('a'::text = ANY (city)) AND ('a'::text = ANY (state))) OR ('1234'::text = ANY (zip_code))))" " Rows Removed by Filter: 900537" "Total runtime: 1260.026 ms"

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.