Just wondering why elasticsearch still use that simple routing value approach for deciding which shard the data must be stored to. Actually this approach is limiting us to change the number of shards in the future. If elasticsearch uses an approach like consistent hashing (or even better technique), it can give us a chance to change the shard number in the future. Anyone have explanation or idea about this?
-
1It is not possible to change the number of shards after having created an index, so that's a non-issue :-)Val– Val2017-09-15 09:24:13 +00:00Commented Sep 15, 2017 at 9:24
-
@Val it isn't possible because elasticsearch uses this routing value, right?indraep– indraep2017-09-15 09:40:43 +00:00Commented Sep 15, 2017 at 9:40
-
Yes, but that's a design choice made by the ES folks and they won't change that in the near future.Val– Val2017-09-15 09:47:52 +00:00Commented Sep 15, 2017 at 9:47
1 Answer
As of Elasticsearch release 6.1.0, index splitting is possible. See release note: https://www.elastic.co/blog/elasticsearch-6-1-0-released.
The Split Index documentation actually explains why Elasticsearch doesn't use Consistent Hashing in more detail.
Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store.