I have two postgres databases (one for development, one for test). Both have the same structure.
I'm running this query on both (merges one table into another)
WITH moved_rows AS
(
DELETE FROM mybigtable_20220322
RETURNING *
)
INSERT INTO mybigtable_202203
SELECT * FROM moved_rows;
But I get slightly different results for EXPLAIN on each version.
Development (Postgres 13.1) -
Insert on mybigtable_202203 (cost=363938.39..545791.17 rows=9092639 width=429)
CTE moved_rows
-> Delete on mybigtable_20220322 (cost=0.00..363938.39 rows=9092639 width=6)
-> Seq Scan on mybigtable_20220322 (cost=0.00..363938.39 rows=9092639 width=6)
-> CTE Scan on moved_rows (cost=0.00..181852.78 rows=9092639 width=429)
Test (Postgres 14.1) -
Insert on mybigtable_202203 (cost=372561.91..558377.73 rows=0 width=0)
CTE moved_rows
-> Delete on mybigtable_20220322 (cost=0.00..372561.91 rows=9290791 width=6)
-> Seq Scan on mybigtable_20220322 (cost=0.00..372561.91 rows=9290791 width=6)
-> CTE Scan on moved_rows (cost=0.00..185815.82 rows=9290791 width=429)
The big difference is the first line, on Development I get rows=9092639 width=429 on Test I get rows=0 width=0
All the tables have the same definitions, with the same indexes (not that they seem to be used) the query succeeds on both databases, the EXPLAIN indicates similar costs on both database, and the tables on each database have a similar record count (just over 9 million rows)
In practice the difference is that on Development the query takes a few minutes, on Test is takes a few hours.
Both databases were created with the same scripts, so should be 100% identical my guess is there's some small, subtle difference that's crept in somewhere. Any suggestion on what the difference might be or how to find it? Thanks
Update
- both the tables being merged (on both databases) have been VACUUM ANALYZED in similar timeframes.
- I used
fcto compare both DBs. There was ONE difference, on the development database the table was clustered on one of the indexes. I did similar clustering on the test table but results didn't change. - In response to the comment 'the plans are the same, only the estimated rows are different'. This difference is the only clue I currently have to an underlying problem. My development database is on a 10 year old server struggling with lack of resources, my test database is a brand new server. The former takes a few minutes to actually run the query the later takes a few hours. Whenever I post a question on the forum I'm always told 'start with the explain plan'
VACUUM ANALYZE;on the test db to get some usable statistics.pg_dump --schema-only ... >outputXYon both databases, anddiffthe resulting outputX and outputYVACUUM).