Proper indexes for this join query in Postgresql

Question

I have two tables:

User id | name ..

Pull Requests id | user_id | created_at |...

I need to fetch all users join them with the count of their pull requests of a particular year. So I wrote a query like so:

SELECT users.*, COUNT(pull_requests.id) as pull_requests_count
FROM "users" INNER JOIN
     "pull_requests"
     ON "pull_requests"."user_id" = "users"."id"
WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
GROUP BY users.id

I initially had indexes on,

pull_requests.user_id (btree). On doing explain I got this:

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=18.93..18.96 rows=3 width=2775)
   ->  Hash Join  (cost=14.13..18.92 rows=3 width=2775)
         Hash Cond: (users.id = pull_requests.user_id)
         ->  Seq Scan on users  (cost=0.00..4.08 rows=108 width=2771)
         ->  Hash  (cost=14.09..14.09 rows=3 width=8)
               ->  Bitmap Heap Scan on pull_requests  (cost=4.28..14.09 rows=3 width=8)
                     Recheck Cond: (date_part('year'::text, created_at) = 2013::double precision)
                     ->  Bitmap Index Scan on pull_req_extract_year_created_at_ix  (cost=0.00..4.28 rows=3 width=0)
                           Index Cond: (date_part('year'::text, created_at) = 2013::double precision)

Then I added an index like this:

CREATE INDEX pull_req_extract_year_created_at_ix ON pull_requests (EXTRACT(year FROM created_at));

And now my explain is:

                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 HashAggregate  (cost=63.99..64.02 rows=3 width=2775)
   ->  Hash Join  (cost=59.19..63.98 rows=3 width=2775)
         Hash Cond: (users.id = pull_requests.user_id)
         ->  Seq Scan on users  (cost=0.00..4.08 rows=108 width=2771)
         ->  Hash  (cost=59.16..59.16 rows=3 width=8)
               ->  Seq Scan on pull_requests  (cost=0.00..59.16 rows=3 width=8)
                     Filter: (date_part('year'::text, created_at) = 2013::double precision)

Still I get 6.6 ms for 100 or so rows. How do I further optimize this?

Thanks!

If you want to improve a 6.6 ms query, you gotto think twice. Really: Once database gets bigger, your query would not fit into memory, and the times could explode beyond 1ooo ms. — wildplasser
– wildplasser, Commented Dec 15, 2013 at 0:50
Actually I am doing a limit 200 which is not shown here (forgot to add). Is this ok in that case? — Steve Robinson
– Steve Robinson, Commented Dec 15, 2013 at 8:48
The limit 200 would also need an order by (unless you want random 200 from the aggregated users.id s), which would cause the outer hashjoin to be impossible, causing an index join, nested loop, or an explicit sort step (which would blow up the footprint of your query). LIMIT is an ugly beast. — wildplasser
– wildplasser, Commented Dec 15, 2013 at 14:37
Wow. Thanks for that incredible information! Amazing. Inspired me to dig deeper into database concepts. And btw, I just need random 200 users. so I am not using an order by. — Steve Robinson
– Steve Robinson, Commented Dec 16, 2013 at 11:34

Gordon Linoff · Accepted Answer · 2013-12-14 23:42:52Z

1

Try combining the two indexes into one:

CREATE INDEX pr_ix ON pull_requests(EXTRACT(year FROM created_at), user_id);

and then phrasing the query as:

SELECT users.*, pull_requests_count
FROM "users" INNER JOIN
     (select user_id, count(*) as pull_requests_count
      from "pull_requests"
      WHERE (EXTRACT(year FROM pull_requests.created_at) = 2013)
      group by user_id
     ) pr
     ON pr."user_id" = "users"."id";

The index completely covers the subquery, so the original table will not be needed, just an index scan. This can then be joined back to the users.

answered Dec 14, 2013 at 23:42

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Proper indexes for this join query in Postgresql

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related