How data is distributed in shards

brijeshrai82 · January 7, 2017, 6:18pm

Hi, I am new to elasticsearch and just curious to know how data is distributed in multiple shards.
On a single node I have created an index with 2 shards. Node has 60 Gb of disk space and 16 Gb of RAM. I have inserted the data of around 2Gb in the index. This 2Gb data will be equally distributed in both the shards (1Gb each) or entire data will be stored in one shard only?

nik9000 · January 7, 2017, 6:58pm

This should explain it.

jasontedor · January 7, 2017, 9:25pm

In general, the data would be roughly equally distributed between the two shards. This is not because we attempt to equally distribute the data, but instead because documents are routed to shards based on hashing the document IDs. With a good hash function, the data will distribute itself roughly equally (short of a pathological distribution of document IDs). The hash function that we use (murmur3) has good distribution qualities. We use a hash function for routing so that we always know which shard a document is in given the document ID and the number of shards in the index. This is why you can not change the number of shards with the exception of shrinking the number of shards to a divisor of the original number of shards.

brijeshrai82 · January 8, 2017, 7:11am

Very well explained. Thanks Jason and Nik.
Just wondering if there is any way to find out size of shards?

nik9000 · January 8, 2017, 12:13pm

They are directories on disk that you canls or you can use the
_cat/shards api

system · February 5, 2017, 12:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data distribute in large cluster with many indices Elasticsearch	4	616	September 23, 2017
Shards are not equal size in one index Elasticsearch	5	1369	July 5, 2017
Shards of an index present only in one node in a multinode cluster Elasticsearch	5	737	October 20, 2021
Uneven Shard Distribution Elasticsearch	2	2089	January 18, 2018
Why shards doesn't distribute evenly? Elasticsearch	3	699	January 25, 2017

How data is distributed in shards

Related topics