Postgres settings to maximize compute resources for a single local connection [closed]

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question does not appear to be about programming within the scope defined in the help center.

Closed last year.

I have a local PG instance populated with 10's of millions of rows of data and need to run some relatively complex queries for some data analysis work.

These queries are currently taking 10+ minutes to return (even after adding custom index's for every query I'm running).

While this may not be the "right tool for the job", I'm sure that my system isn't fully utilizing its available resources. I've configured my system using PGTune, but it still seems like this is taking into account a margin of safety for application stability, multiple connections, competing processes, etc.

If I just want PG to run as fast as it possibly can for my single connection...

What are the most important settings? And how should they be configured, relative to the system specs?

(Mine are 8-core, 32GB ram for example)

Disagree with the close reason. PG falls well into "software tools primarily used by programmers." — NSjonas
– NSjonas, Commented Sep 30, 2024 at 2:55
Right. The correct close reason is that you provided too little information. — Laurenz Albe
– Laurenz Albe, Commented Sep 30, 2024 at 6:02
"margin of safety for application stability" - if you're willing to sacrifice that, you can consider non-durable settings. Make sure you have a cold backup to go back to and take a look at the rest of performance tips and resource consumption settings. — Zegarek
– Zegarek, Commented Sep 30, 2024 at 15:25
The manual lists EXPLAIN first for a reason - even a very slight improvement to your schema and the queries in your pipeline can yield disproportionately better results (orders of magnitude) than multiplying your resources, lifting all resource consumption constraints and removing all safety measures. One advantage of those last 3 things is you don't have to read any of the code you're trying to speed up - which might be preferable when dealing with a ton of legacy code to go through. Still, even then, removing just a few most obvious bottlenecks might outweigh all config tweaks. — Zegarek
– Zegarek, Commented Sep 30, 2024 at 15:46
Could you please share the results from explain(analyze, verbose, buffers, settings) for your slow SQL statement, the statement itself and the DDL for all tables and indexes involved? All in plain text, as an update of your question. And for your information, there is no configuration setting "maximize compute resources for a single connection". The best config depends on your hardware, usage pattern and configuration. — Frank Heikens
– Frank Heikens, Commented Sep 30, 2024 at 16:50

jjanes · Accepted Answer · 2024-09-30 16:05:00Z

1

The default for max_parallel_workers_per_gather is 2, which is too low for the situation you describe. It should be set equal to the number of CPUs (actually one less than the number of CPUs, but that isn't likely to make any meaningful difference). But not all queries can benefit from parallel workers, so this might not make much difference to you.

High values for effective_io_concurrency can help if your queries involve bitmap heap scans and if your IO system can benefit from having multiple IO requests in flight at the same time (RAID/JBOD generally can. SSD (even single-drive systems) usually can if it is high quality. Even single-disk HDD can often get some smallish benefit).

Setting effective_cache_size to the same size as all of RAM (or just slightly less) can help for some queries.

Increasing work_mem can help. But be careful not to overdue, even a single session can allocate many multiples of work_mem if it involves sorts in many different executor nodes, or parallel workers, or partitions. Although the "spill to disk" algorithms are now so good that this often doesn't make a big difference.

answered Sep 30, 2024 at 16:05

jjanes

44.9k5 gold badges39 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

NSjonas Over a year ago

This comes the closest to answering the question as asked (How do I squeeze max performance out of my resources)... Although after tweaking things and maxing stuff out, it's clear that really only takes you so far and there is no escaping query optimization.

Laurenz Albe · Accepted Answer · 2024-09-30 06:01:24Z

1

If your statements are slow, you have to find the cause with EXPLAIN (ANALYZE, BUFFERS, SETTINGS) and improve that. Twiddling parameters will achieve less than you think.

That said, the most important parameter for queries on big tables is work_mem. Set it as high as you can without going out of memory.

answered Sep 30, 2024 at 6:01

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Collectives™ on Stack Overflow

Postgres settings to maximize compute resources for a single local connection [closed]

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related