0

I'm trying to get a count of 'visit' records created by joining values from 3 tables. I have a simple query below but it takes almost 30 min on the db. Is there a way to optimize this query any further?

select a."ClientID" as ClientID, b."ProviderID" as ProviderID, count(1) as VisitCount
from "Log" c
inner join "MessageDetail" b on c."MessageDetailID" = b."MessageDetailID"
inner join "Message" a on a."MessageID" = b."MessageID"
where a."CreatedUTCDate" >= NOW() - INTERVAL '1 HOUR'
group by a."ClientID", b."ProviderID"

Example Result

ClientID                                ProviderID   VisitCount
3245cf64-test-4d05-9d5d-345653566455    677777       1
3245cf64-test-4d05-9d5d-345653566455    677777       1
0284a326-test-4757-b00e-34563465dfgg    9999         5

Explain plan

GroupAggregate  (cost=6529150.62..6529160.28 rows=483 width=48)
  Group Key: a."ClientID", b."ProviderID"
  ->  Sort  (cost=6529150.62..6529151.83 rows=483 width=40)
        Sort Key: a."ClientID", b."ProviderID"
        ->  Nested Loop  (cost=1.00..6529129.09 rows=483 width=40)
              ->  Nested Loop  (cost=0.56..6509867.54 rows=3924 width=48)
                    ->  Seq Scan on "Message" a  (cost=0.00..6274917.96 rows=3089 width=44)
                          Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
                    ->  Index Scan using "ix_MessageDetail_MessageId" on "MessageDetail" b  (cost=0.56..75.40 rows=66 width=20)
                          Index Cond: ("MessageID" = a."MessageID")
              ->  Index Only Scan using "ix_Log_MessageDetailId" on "Log" c  (cost=0.43..4.90 rows=1 width=8)
                    Index Cond: ("MessageDetailID" = b."MessageDetailID")

Explain Analyze Plan

GroupAggregate  (cost=6529127.35..6529137.01 rows=483 width=48) (actual time=791639.382..791661.555 rows=118 loops=1)
  Group Key: a."ClientID", b."ProviderID"
  ->  Sort  (cost=6529127.35..6529128.56 rows=483 width=40) (actual time=791639.373..791649.235 rows=64412 loops=1)
        Sort Key: a."ClientID", b."ProviderID"
        Sort Method: external merge  Disk: 3400kB
        ->  Nested Loop  (cost=1.00..6529105.82 rows=483 width=40) (actual time=25178.920..791410.769 rows=64412 loops=1)
              ->  Nested Loop  (cost=0.56..6509844.55 rows=3924 width=48) (actual time=25178.874..790954.577 rows=65760 loops=1)
                    ->  Seq Scan on "Message" a  (cost=0.00..6274894.96 rows=3089 width=44) (actual time=25178.799..790477.178 rows=25121 loops=1)
                          Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
                          Rows Removed by Filter: 30839080
                    ->  Index Scan using "ix_MessageDetail_MessageId" on "MessageDetail" b  (cost=0.56..75.40 rows=66 width=20) (actual time=0.009..0.016 rows=3 loops=25121)
                          Index Cond: ("MessageID" = a."MessageID")
              ->  Index Only Scan using "ix_Log_MessageDetailId" on "Log" c  (cost=0.43..4.90 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=65760)
                    Index Cond: ("MessageDetailID" = b."MessageDetailID")
                    Heap Fetches: 65590
Planning time: 38.501 ms
Execution time: 791662.728 ms
4
  • 1
    run 'explain' before your query to get a plan. 'explain analyse' to run it and get the plan that was actually executed. More variations on explain can be found in the docs We need more information to help you. Explain plans are very useful. Commented Sep 20, 2022 at 20:04
  • @VynlJunkie I updated my question with 'explain' plan. Will add 'explain analyze' once it finishes running. Commented Sep 20, 2022 at 20:19
  • Whenever you ask about SQL query optimization, show the table and index definitions, as well as row counts for each of the tables. Maybe your tables are defined poorly. Maybe the indexes aren't created correctly. Maybe you don't have an index on that column you thought you did. Without seeing the table and index definitions, we can't tell. Commented Sep 20, 2022 at 21:01
  • This statement: where a."CreatedUTCDate" >= NOW() - INTERVAL '1 HOUR' doesn't appear to use an index... either because there isn't one or because the function and the mutable "now" is suppressing the index usage. Is there any way you can use a fixed date as an anchor, even if you have to periodically update the query, just to invoke the index? How far back does data in that table go? If very far, can you also add partitions? Commented Sep 20, 2022 at 21:15

1 Answer 1

2

This part of the execution plan

->  Seq Scan on "Message" a  (...) (actual time=25178.799..790477.178 rows=25121 loops=1)
      Filter: ("CreatedUTCDate" >= (now() - '01:00:00'::interval))
      Rows Removed by Filter: 30839080

proves that an Index on "CreatedUTCDate" would speed up this query quite a lot:

  • almost the complete execution time is spent in this sequential scan

  • you scan over 30 million rows to find 25000, so the filter condition is highly selective

Sign up to request clarification or add additional context in comments.

1 Comment

And maybe you want to include MessageID as well, because of the join condition on this column. The database should be able to use an index-only scan for this.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.