I was wondering about the specific implementation of sharding within elasticsearch. I read many of the blog posts that state that the default value for sharding is 5 shards per index, however I was wondering about the process of how indexes are broken down into shards. Is it broken down by each term within a document, perhaps terms a-d will be sharded separately from d-z? Or by document where documents 1-2 will be sharded separately from documents 2-4? I’ve been trying to research this question to learn more about the underlying implementation, whether that's term sharding vs document sharding and have not had much luck. Any insight or references would be great! Thank you very much!
Entire documents are sharded, according to their document ID. The ID is hashed and then the hash modulo number of shards determines which primary shard receives the document.
Each shard maintains it's own term dictionary for frequencies, so searches are effectively shard-local and the results are merged at the coordinating node (with the assumption that doc frequencies are similar between all shards).
Hope this helps!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.