How data is distributed in shards


(Brijesh) #1

Hi, I am new to elasticsearch and just curious to know how data is distributed in multiple shards.
On a single node I have created an index with 2 shards. Node has 60 Gb of disk space and 16 Gb of RAM. I have inserted the data of around 2Gb in the index. This 2Gb data will be equally distributed in both the shards (1Gb each) or entire data will be stored in one shard only?


(Nik Everett) #2

This should explain it.


(Jason Tedor) #3

In general, the data would be roughly equally distributed between the two shards. This is not because we attempt to equally distribute the data, but instead because documents are routed to shards based on hashing the document IDs. With a good hash function, the data will distribute itself roughly equally (short of a pathological distribution of document IDs). The hash function that we use (murmur3) has good distribution qualities. We use a hash function for routing so that we always know which shard a document is in given the document ID and the number of shards in the index. This is why you can not change the number of shards with the exception of shrinking the number of shards to a divisor of the original number of shards.


(Brijesh) #4

Very well explained. Thanks Jason and Nik.
Just wondering if there is any way to find out size of shards?


(Nik Everett) #5

They are directories on disk that you canls or you can use the
_cat/shards api


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.