0

I have a query like

    SELECT id 
    FROM   x 
    WHERE  is_valid = true 
        AND id IN (SELECT another_id 
                    FROM   y 
                    WHERE  ( other_id IN ( 1, 2, 3, 4, 
                                            5, 6, 7, 8, 
                                            11, 16, 17, 18, 
                                            19, 20, 21, 22, 
                                            24, 26, 27, 28, 
                                            30, 31, 32, 34, 
                                            35, 36, 37, 38, 
                                            41, 43, 45, 46, 
                                            47, 48, 49, 50, 
                                            51, 52, 53, 54, 
                                            55, 56, 57, 58, 
                                            59, 60, 61, 63, 
                                            65, 67, 69, 72, 
                                            73, 76, 79, 80, 
                                            81, 82, 83, 84, 
                                            85, 86, 87, 88, 
                                            89, 90, 91, 92, 
                                            94, 95, 96, 97, 
                                            98, 100, 101, 102, 
                                            104, 105, 106, 107, 
                                            108, 109, 110, 112, 
                                            113, 114, 115, 116, 
                                            117, 118, 119, 121, 127 ) 
                            AND is_valid = true ));

I have analyzed the query in here https://explain.depesz.com/s/7ZWN.

I have index on is_valid fields and also has index together on other_id and is_valid. another_id and other_id fields are primary keys of Y table. X table has +900k and Y table has +15M entry in them.

Index declarations:

"y_is_valid_idx" btree (is_valid)

"y_other_id_is_valid_2344df8a_idx" btree (other_id, is_valid)

This query takes place at least 30sec to perform and it is a problem for API to response. I am using PostgreSql 9.6 and Django 1.11 for development. Can you suggest a way to faster this operation?

8
  • 1
    Start by running the query using EXPLAIN ANALYZE: postgresql.org/docs/current/sql-explain.html You can paste the output here to get some insights if you have trouble reading the output: explain.depesz.com Commented Sep 12, 2019 at 12:55
  • @Wolph done it and added result to the question. Commented Sep 12, 2019 at 13:14
  • It is tough to see why an index on y (is_valid, other_id) would not be used. Are you sure you have a valid index on that? Commented Sep 12, 2019 at 16:36
  • You would expect the index on y to be used. If your queries are always similar you could create a partial index on y(other_id) with the is_valid = true filter. It's also possible that you need to increase the amount of statistics on the table. With a table thas has 15M rows the default of 100 might not be sufficient. Note that you want to set this specifically for that column/index, not the entire database: postgresql.org/docs/9.6/planner-stats.html Commented Sep 12, 2019 at 19:55
  • @jjanes added index declarations to the question Commented Sep 13, 2019 at 6:24

1 Answer 1

1

Maybe you could perform this query faster about 5 secs but this will not solve the problem. running this type of heavy queries should be done before the request is made. you could do this by celery and perform your query in periodic times and then store your result in a model. then when request happens you have results already so you can return the result at the moment.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.