0

When running the following query, it sometimes takes 15 seconds and sometimes 90mins. What causes this big difference?

INSERT INTO missing_products 
SELECT table_name, 
   product_id 
FROM   products 
WHERE  table_name = 'xxxxxxxxx' 
   AND product_id NOT IN (SELECT id 
                               FROM new_products);

I have tried an explain on it and the only thing I can see is an index only scan on new products. I did also rewrite out this query to have a left join instead and insert the rows where the right side is NULL but this causes the same problem with the time.

I have the following tables with a structure something like what follows.

products

id bigint not null,
product_id text not null,
table_name text not null,
primary key (id),
unique index (product_id)

new_products

id text not null,
title text not null,
primary key, btree (id)

missing_products

table_name text not null,
product_id text not null,
primary key (table_name, product_id)

Explain - This has an extra field in the where clause but should give a good idea. Time it took 22 seconds.

 Insert on missing_products  (cost=5184.80..82764.35 rows=207206 width=38) (actual time=22466.525..22466.525 rows=0 loops=1)
   ->  Seq Scan on products  (cost=5184.80..82764.35 rows=207206 width=38) (actual time=0.055..836.217 rows=411150 loops=1)
         Filter: ((active > (-30)) AND (NOT (hashed SubPlan 1)) AND (feed = 'xxxxxxxx'::text))
         Rows Removed by Filter: 77436
         SubPlan 1
           ->  Index Only Scan using new_products_pkey on new_products  (cost=0.39..5184.74 rows=23 width=10) (actual time=0.027..0.027 rows=0 loops=1)
                 Heap Fetches: 0
 Planning time: 0.220 ms
 Execution time: 22466.596 ms
1
  • kindly provide with EXPLAIN ANALYZE output, that would help us know if there are any triggers that could be causing the slowness Commented Dec 24, 2015 at 14:14

2 Answers 2

1

Apparently looking at the output of your EXPLAIN ANALYZE, the SELECT hardly takes 800ms, most of the time, almost 22seconds is spend in INSERTING rows.

Also, it seems that statistics are not accurate for your new_products table, as it predicts 23 rows whereas actual rows are only 0, tough the plan looks correct now, it could be disastrous depending on how new_products table is used throughout your app, I'd first ANALYZE the table on regular intervals if the auto analyze is not kicking in, and monitor the performance over a days' time

Sign up to request clarification or add additional context in comments.

5 Comments

Good observations. I guess it would be interesting to see the actual times when the performance is bad (90 mins), and see if the time is proportionally spent in the same places, or if the select statement is suddenly the bottleneck.
indeed, I suspect the slowness of 90m would be due to the incorrect stats
If it is a problem with incorrect stats, how do I fix this instead of running ANALYZE at regular intervals?
You can decrease autovacuum_analyze_threshold for the given table, more info: postgresql.org/docs/9.0/static/…
Thanks, i've changed the query to explain analyze too so i can see if the query plan changes for when it takes 90mins.
0

I would try 2 things:

  1. Try adding an index on products.table_name, which you don't seem to have at the moment.

  2. Try rewriting the query to use a not exists clause instead of not in. Sometimes, the database can perform the query more efficiently that way:

Query with not exists:

INSERT INTO missing_products (table_name, product_id)
SELECT p.table_name, p.product_id 
  FROM products p
 WHERE p.table_name = 'xxxxxxxxx' 
   AND NOT EXISTS (SELECT null
                     FROM new_products n
                    WHERE n.id = p.product_id)

1 Comment

Wow this changes the query plan quiet nicely and depends more on a sequential scan. Just testing now making sure it returns the same results.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.