PostgreSQL first query slow

Question

I implemented cursor pagination. And for first rows it works realy well but the more I scroll down, the first query I send is slower. I run this query:

SELECT *
FROM "movie" "m"
INNER JOIN "movie_stats" "ms" ON "m"."uuid" = "ms"."movie_uuid"
WHERE (((("ms"."data"->'stat'->'overall'->>'total')::FLOAT), "ms"."movie_uuid") < (74.566, '50bca81c-4676-403e-8314-c721ba67646c')) AND ("m"."status" != 'deleted')

ORDER BY (("ms"."data"->'stat'->'overall'->>'total')::FLOAT) DESC NULLS LAST, "ms"."movie_uuid" DESC NULLS LAST
LIMIT 40

And after first run of query execution time is 444ms:

QUERY PLAN
Limit  (cost=0.84..154.18 rows=40 width=565) (actual time=9.171..444.788 rows=40 loops=1)"
  ->  Nested Loop  (cost=0.84..506620.20 rows=132160 width=565) (actual time=9.169..444.735 rows=40 loops=1)"
        ->  Index Scan using movie_stats_stat_overall_score_idx on movie_stats ""ms""  (cost=0.42..165910.17 rows=132443 width=741) (actual time=9.078..276.405 rows=40 loops=1)"
              Index Cond: (ROW(((((data -> 'stat'::text) -> 'overall'::text) ->> 'total'::text))::double precision, movie_uuid) < ROW('74.566'::double precision, '50bca81c-4676-403e-8314-c721ba67646c'::uuid))"
        ->  Index Scan using movie_pkey on movie m  (cost=0.42..2.56 rows=1 width=541) (actual time=4.188..4.188 rows=1 loops=40)"
              Index Cond: (uuid = ""ms"".movie_uuid)"
              Filter: (status <> 'deleted'::movie_status)"
Planning time: 1.140 ms
Execution time: 444.943 ms

But after executin this same query for second time exection time is only 1ms:

QUERY PLAN
Limit  (cost=0.84..154.18 rows=40 width=1314) (actual time=0.066..0.791 rows=40 loops=1)"
  ->  Nested Loop  (cost=0.84..506620.20 rows=132160 width=1314) (actual time=0.064..0.776 rows=40 loops=1)"
        ->  Index Scan using movie_stats_stat_overall_score_idx on movie_stats ""ms""  (cost=0.42..165910.17 rows=132443 width=749) (actual time=0.030..0.120 rows=40 loops=1)"
              Index Cond: (ROW(((((data -> 'col'::text) -> 'overall'::text) ->> 'total'::text))::double precision, movie_uuid) < ROW('74.566'::double precision, '50bca81c-4676-403e-8314-c721ba67646c'::uuid))"
        ->  Index Scan using movie_pkey on movie m  (cost=0.42..2.56 rows=1 width=541) (actual time=0.011..0.011 rows=1 loops=40)"
              Index Cond: (uuid = ""asc"".movie_uuid)"
              Filter: (status <> 'deleted'::movie_status)"
Planning time: 1.252 ms
Execution time: 0.916 ms

And this happens for each next 40 rows I go down. Can please someone explain me this. Thanks for help!

The concept you need to look up is "caching". On the first read the data is being read from disk, on the repeated query it is already in RAM. The one is many many times faster than the other. — Richard Huxton
– Richard Huxton, Commented Apr 26, 2022 at 12:49
It's using the index allright, but it's using it badly. It walks the index by PK and filter out most rows. That's bound to be expensive. I would add an expression index on it. — The Impaler
– The Impaler, Commented Apr 26, 2022 at 13:00
Make sure track_io_timing is on, then do EXPLAIN (ANALYZE, BUFFERS). — jjanes
– jjanes, Commented Apr 26, 2022 at 13:10
@TheImpaler I don't think it does filter out most rows. That would be reported in the plan if it were happening. If you are looking at 132443, that is just how many rows that node would fetch if it were run to completion. But it doesn't run to completion due to the LIMIT. — jjanes
– jjanes, Commented Apr 26, 2022 at 13:15
Could you show us the DDL for the tables and indexes involved? — Frank Heikens
– Frank Heikens, Commented Apr 26, 2022 at 13:27

Laurenz Albe · Accepted Answer · 2022-04-26 13:05:26Z

2

The first execution probably has to fetch data from disk, and the second already finds the data in shared buffers. You can diagnose that with EXPLAIN (ANALYZE, BUFFERS), which will show you the number of 8kB-blocks found in cache (hit) and read from disk (read).

For example:

 Seq Scan on tab  (...) (actual time=0.353..126.805 ...)
   Buffers: shared read=1959

versus

 Seq Scan on tab  (...) (actual time=0.011..21.471 ...)
   Buffers: shared hit=1959

If you need to speed that up, there are two possibilities:

Don't SELECT * and put all columns that you need in the query into the index. Then you can get an index-only scan, which may not hit the table at all if you VACUUM it.

Of course, that is probably not feasible if you need a lot of columns.
Get more RAM and try to keep the table in cache. pg_prewarm may help.

answered Apr 26, 2022 at 13:05

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

The Impaler Over a year ago

Out of curiosity, how stable would you think those "hit" numbers are? I mean (for another different project & queries I have), if I get the execution plan and the hits look good, will those hits numbers degrade/change often over time? Maybe I could monitor them periodically and find out myself.

Laurenz Albe Over a year ago

@TheImpaler Pages remain in cache if 1) they are used frequently and 2) the cache is big enough to avoid high pressure.

Collectives™ on Stack Overflow

PostgreSQL first query slow

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related