0

I read this post on how to deal with large partitions and partitioning hotspots, their solution is to add a sharding key as part of the partition key, and keep the shard size at a fixed size, say 1000. The fixed shard size even helps pagination.

But my question is, how can we keep a fixed shard size? In my understanding, the common approach for hotspot issue is to add a sharding key (random_number % n for example) to the partition key to split hotspots, but it does not guarantee to limit shard size, wondering how it works out in their approach.

1 Answer 1

1

The post details a number of approaches to the sharding problem - the first just has a number of shards but no way in which to track how many shard you have stored for a given partition.

The second solution adds a static count column which provides that information but it will fall foul of read-before-write and race conditions on inserting to the shard, especially when data is inserted in parallel. If you get around that however, and you know the size of a row is relatively static, then you can use the counter to control the size (approximately). If the size varies considerably, its a rough guess at best.

The third solution is the same as your %n - in that a fixed number of shards, or 'buckets' are assumed to exist.

I would start the process though by calculating what you expect each partition to contain, and work from there, and not to prematurely optimise.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Andrew for your detailed answer. for the first approach, "no way in which to track how many shard you have stored for a given partition.", do you mean no way to accurately or efficiently track how many rows in sharded partition?
There is no efficient row count mechanism for the partition which you can access without performing a select query against the whole of that partition. Any such value would have to be maintained externally and stored (which is what the static column solution is doing)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.