1

I have two tables:

User id | name ..

Pull Requests id | user_id | created_at |...

I need to fetch all users join them with the count of their pull requests of a particular year. So I wrote a query like so:

SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
     "pull_requests"
     ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id

I initially had indexes on,

pull_requests.user_id (btree). On doing explain I got this:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=18.93..18.96 rows=3 width=2775)
   ->  Hash Join  (cost=14.13..18.92 rows=3 width=2775)
         Hash Cond: (users.id = pull_requests.user_id)
         ->  Seq Scan on users  (cost=0.00..4.08 rows=108 width=2771)
         ->  Hash  (cost=14.09..14.09 rows=3 width=8)
               ->  Bitmap Heap Scan on pull_requests  (cost=4.28..14.09 rows=3 width=8)
                     Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
                     ->  Bitmap Index Scan on pull_req_extract_year_created_at_ix  (cost=0.00..4.28 rows=3 width=0)
                           Index Cond: (date_part('year'::text, created_at) = 2013::double precision)

Then I added an index like this:

CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));

And now my explain is:

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 HashAggregate  (cost=63.99..64.02 rows=3 width=2775)
   ->  Hash Join  (cost=59.19..63.98 rows=3 width=2775)
         Hash Cond: (users.id = pull_requests.user_id)
         ->  Seq Scan on users  (cost=0.00..4.08 rows=108 width=2771)
         ->  Hash  (cost=59.16..59.16 rows=3 width=8)
               ->  Seq Scan on pull_requests  (cost=0.00..59.16 rows=3 width=8)
                     Filter: (date_part('year'::text, created_at) = 2013::double precision)

Still I get 6.6 ms for 100 or so rows. How do I further optimize this?

Thanks!

4
  • If you want to improve a 6.6 ms query, you gotto think twice. Really: Once database gets bigger, your query would not fit into memory, and the times could explode beyond 1ooo ms. Commented Dec 15, 2013 at 0:50
  • Actually I am doing a limit 200 which is not shown here (forgot to add). Is this ok in that case? Commented Dec 15, 2013 at 8:48
  • 1
    The limit 200 would also need an order by (unless you want random 200 from the aggregated users.id s), which would cause the outer hashjoin to be impossible, causing an index join, nested loop, or an explicit sort step (which would blow up the footprint of your query). LIMIT is an ugly beast. Commented Dec 15, 2013 at 14:37
  • Wow. Thanks for that incredible information! Amazing. Inspired me to dig deeper into database concepts. And btw, I just need random 200 users. so I am not using an order by. Commented Dec 16, 2013 at 11:34

1 Answer 1

1

Try combining the two indexes into one:

CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id);

and then phrasing the query as:

SELECT users.*, pull_requests_count
FROM "users" INNER JOIN
     (select user_id, count(*) as pull_requests_count
      from "pull_requests"
      WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
      group by user_id
     ) pr
     ON pr."user_id" = "users"."id";

The index completely covers the subquery, so the original table will not be needed, just an index scan. This can then be joined back to the users.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.