How is sharding implemented?

sm6405 · October 30, 2018, 7:12pm

I was wondering about the specific implementation of sharding within elasticsearch. I read many of the blog posts that state that the default value for sharding is 5 shards per index, however I was wondering about the process of how indexes are broken down into shards. Is it broken down by each term within a document, perhaps terms a-d will be sharded separately from d-z? Or by document where documents 1-2 will be sharded separately from documents 2-4? I’ve been trying to research this question to learn more about the underlying implementation, whether that's term sharding vs document sharding and have not had much luck. Any insight or references would be great! Thank you very much!

polyfractal · October 30, 2018, 7:27pm

Entire documents are sharded, according to their document ID. The ID is hashed and then the hash modulo number of shards determines which primary shard receives the document.

Each shard maintains it's own term dictionary for frequencies, so searches are effectively shard-local and the results are merged at the coordinating node (with the assumption that doc frequencies are similar between all shards).

Hope this helps!

system · November 27, 2018, 7:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sharding and Performance Elasticsearch	1	308	August 29, 2018
When do you need more then 1 shard? Elasticsearch	12	1851	July 6, 2017
Documents distributions across shards Elasticsearch	2	319	October 17, 2019
Records per shard Elasticsearch	7	1006	July 6, 2017
Docs about sharding and scatter/gather Elasticsearch	5	1849	July 6, 2017

How is sharding implemented?

Related topics