Solr allows shard splitting, why doesn't Elasticsearch?


(Shaunak Kashyap) #1

The primary reason for this is shard addressing.

When a new document arrives a hash is computed to address where the shard should reside, this occurs in near realtime in a way that is fast and scalable for the number of nodes in a cluster. The cost for this speed is the setting of a constant number of shards as the divisor for the formula to distribute the doc by the doc ID. If the hash computation for adding shards was changed during cluster life then things would get chaotic and Elasticsearch would no long find documents for search results. To work around this you have to reindex, which recomputes the hashes for a higher number of shards.

There are some existing systems that allow shard growth in a limited way. By using a hash ring algorithm for example, you could address only each third shard and leave the next two places in the hash ring free for adding shards later, e.g. Project Voldemort for an example of this.

But this would mean two things: There would be a limited number of shard splits as only a fixed number of shards can be reserved, and it would be up to the user to calculate the conditions of when to activate the reserved shards and instruct Elasticsearch to use a more difficult algorithm to lookup documents. There are no general conditions to automatically calculate shard overflow, there are many good criteria but in the end the user will get the headache.

Ultimately this is because it is an anti-pattern: the promise is that shard split will solve a problem, but you get a newer, bigger one.

The Elasticsearch philosophy is that it should scale and should be easy to use, so having no headaches around the shard count, once it is initially set, is the choice taken.

If you are looking to split shards because your index is growing, there is another solution. Instead of splitting shards in the index, create a new index and an index alias that maps to both indices. Then your application can refer to the index alias instead of the original index. Querying two indices with one shard each is the same as querying one index with two shards (read more about this at http://www.elastic.co/guide/en/elasticsearch/guide/current/multiple-indices.html).


(system) #2