I have two tables:
User id | name ..
Pull Requests id | user_id | created_at |...
I need to fetch all users join them with the count of their pull requests of a particular year. So I wrote a query like so:
SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
"pull_requests"
ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id
I initially had indexes on,
pull_requests.user_id (btree). On doing explain I got this:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=18.93..18.96 rows=3 width=2775)
-> Hash Join (cost=14.13..18.92 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=14.09..14.09 rows=3 width=8)
-> Bitmap Heap Scan on pull_requests (cost=4.28..14.09 rows=3 width=8)
Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
-> Bitmap Index Scan on pull_req_extract_year_created_at_ix (cost=0.00..4.28 rows=3 width=0)
Index Cond: (date_part('year'::text, created_at) = 2013::double precision)
Then I added an index like this:
CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));
And now my explain is:
QUERY PLAN
--------------------------------------------------------------------------------------------
HashAggregate (cost=63.99..64.02 rows=3 width=2775)
-> Hash Join (cost=59.19..63.98 rows=3 width=2775)
Hash Cond: (users.id = pull_requests.user_id)
-> Seq Scan on users (cost=0.00..4.08 rows=108 width=2771)
-> Hash (cost=59.16..59.16 rows=3 width=8)
-> Seq Scan on pull_requests (cost=0.00..59.16 rows=3 width=8)
Filter: (date_part('year'::text, created_at) = 2013::double precision)
Still I get 6.6 ms for 100 or so rows. How do I further optimize this?
Thanks!
limit 200would also need anorder by(unless you want random 200 from the aggregated users.id s), which would cause the outer hashjoin to be impossible, causing an index join, nested loop, or an explicit sort step (which would blow up the footprint of your query). LIMIT is an ugly beast.