Postgres Differing Query Plans Prod/QA

Question

I have two database servers that are both in "test" mode with one expected to be promoted to the production server. As such, the specs differ and some of the configs, but we are finding that the underpowered server produces better query plans and thus, faster queries.

Stats:

Both systems have roughly the same data and the amount looks like this:

Size     |  Part
--------------------
1.47 TB  |  Entire DB
871 GB   |  Tables
635 GB   |  Indexes

The bigger db server has the following specs:

RAM: 500 GB

CPU: 16 Cores 2.0 GHz Intel

Using SSDs

Postgres 10.0

Memlock set to reserve 485 GB specifically to Postgres

Postgres Settings:

shared_buffers: 125 GB

work_mem: 36 MB

effective_cache_size: 300 GB

random_page_cost: 1

default_statistics_target: 1000

Query Plan: https://explain.depesz.com/s/9Ww6

The smaller server has the following stats:

RAM: 281 GB

CPU: 4 Cores 2.0 GHz Intel

Using SSDs

Postgres 10.0

Memlock set to reserve 240 GB specifically to Postgres

Postgres Settings:

shared_buffers: 50 GB

work_mem: 25.6 MB

effective_cache_size: 150 GB

random_page_cost: 4

default_statistics_target: 100

Query Plan: https://explain.depesz.com/s/4WUH

We've tried switching random_page_cost, default statistics (followed by analyze) and work memory to match each other. The biggest gain came after running a vacuum full on all tables in the query.

Workload: This machine is a read replica used to extract files of the data as XML files, etc. So it receives replicated data and has a fairly heavy read load.

Question: What should I be looking for to make this query more performant on the larger server where it runs slower? Ideally this query runs much faster than it does on the smaller server. It appears as we have scaled we have failed to correctly set the settings to take advantage of our hardware. There must be something I'm overlooking.

Edit: I put the non-obfuscated plans up. I have also tried bumping statistics up to 3000 from 1000 and it didn't help the plans. Same goes for changing the random_page_cost to match between the servers.

Laurenz Albe · Accepted Answer · 2018-05-24 12:51:28Z

1

The PostgreSQL configuration on the two machines is quite different, so it is not surprising that the query plans are different. Particularly random_page_cost has a big impact.

You definitely should benchmark your workload with different shared_buffers settings: yours is probably way too high (normally, 8GB is appropriate).

But I think that both your query plans are terrible, and your problem is in a different spot.

The optimizer misestimates the amount of rows returned from an index scan on showtimes.mappable_program, and this misestimate causes even worse misestimates and bad plan choices later on.

Try increasing the density of the statistics on the column:

ALTER TABLE showtimes.mappable_program ALTER mapping_scheme_id
   SET STATISTICS 1000;

Then ANALYZE the table.

If that doesn't do the trick, modify the query by replacing

WHERE COALESCE(mp2.ignore::integer, 0) = 0

with

WHERE mp2.ignore = '0' OR mp2.ignore IS NULL

That might help the optimizer estimate the condition better.

edited May 24, 2018 at 12:51

answered May 22, 2018 at 7:22

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

user3186332 Over a year ago

Ah yes, random_io_cost should be random_page_cost. I did however change the values to match on each server during my testing and got no change in plans. Additionally we are running the same query on each server. You are correct that I obfuscated the query. I will try your suggestions today. Thank you

user3186332 Over a year ago

I have posted the unobfuscated plans. I tried your suggestions with no increase in performance. Any other suggestions?

Laurenz Albe Over a year ago

Can I see the plan after you created the index and ran ANALYZE?

user3186332 Over a year ago

Here it is after creating the index and running a fresh vacuum/analyze: explain.depesz.com/s/khYZ

Laurenz Albe Over a year ago

I think you can drop the index. I have modified my answer, try these ideas.

|

Collectives™ on Stack Overflow

Postgres Differing Query Plans Prod/QA

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related