Does PostgreSQL database sharding (by partitioning) reduce CPU utilization?

Question

Recently, due to heavy traffic, CPU overload (over 98% utilization) in our database instance. So we decided to do shard our db into multiple instances. As I understand, in postgres, db level sharding is mostly done by partitioning the tables and moving each partition into seperate instance like shown bellow.

I thought this might make the query faster, but not reduce the load on the main partition by that much, since all queries are made to the main partition itself. So I decided to implement application level sharding in our node backend.

Application level sharding works great for all CRUD operations done using partitioned key. But if query needs to be done by key other then the partition key, then we need to go through each partition one by one. So, in this case it would be better to have a table that is un-partitioned, so that all data can be queried using the same table.

What might be a good approach to fix this? Will just implementing db level sharding as shown in the above image be enough to reduce CPU utilization in the main instance.

That depends on the queries. If all the main instance needs to do is re-sum the partial sums obtained from each shard, that would be a huge reduction in CPU. — jjanes
– jjanes, Commented Apr 27, 2022 at 12:38

Laurenz Albe · Accepted Answer · 2022-04-27 10:07:40Z

2

If partitioning is done correctly, then querying data from all shards need not be slower, because all those shards can be queried in parallel. This happens automatically if you use partitioning on the database and define remote shards as partitions that are postgres_fdw foreign tables, because PostgreSQL v14 has introduced the “parallel apend” execution plan node that can parallelize these operations.

If you do sharding on the application level, you'll have to teach your application to query shards in parallel.

answered Apr 27, 2022 at 10:07

Laurenz Albe

257k22 gold badges312 silver badges388 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

samman adhikari Over a year ago

Thanks. My main concern is if it will reduce the resource consumption from the main shard. Since all queries are first sent to the main instance, it might not reduce the load on the instance by that much. Am i wrong?

Laurenz Albe Over a year ago

No, that is correct, partitioning on the database level will route all rows through the database that you query. In that respect, partitioning on the application level has an advantage. But you'll have to implement the parallelism yourself.

acristu Over a year ago

but any WHERE will be sent to the remote servers, so even if the main server still gets all the requests it does not have to do the filtering itself, the remote servers return just the rows that match the conditions and just the columns that were selected, so the CPU usage should decrease cinsiderably. Quote from docs "postgres_fdw attempts to optimize remote queries to reduce the amount of data transferred from foreign servers. This is done by sending query WHERE clauses to the remote server for execution, and by not retrieving table columns that are not needed for the current query"

Laurenz Albe Over a year ago

@acristu i agree.

Collectives™ on Stack Overflow

Does PostgreSQL database sharding (by partitioning) reduce CPU utilization?

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related