Clarification required on shard

vijay_kaali · December 27, 2019, 6:47am

Hi

I have basic doubt on Shard concept.
my understanding // Shard is physical data segment wherein indexes are spread across .
and multiple shard is for performance //

Question is ,

if Shard is physical segment concept, then what is the process of assigning shard after restart.
if the index documents stored in shards , where tokeniser get stored
in a rolling restart it takes long time to assign shards , is there a way , reduce/avoid them

Thanks

HenningAndersen · December 27, 2019, 8:25am

Hi @vijay_kaali,

responses to your questions:

The assignment of shards is called allocation in Elasticsearch. There are a number of settings to guide how Elasticsearch allocates shards to nodes as well as harder rules (for instance that replicas of the same shard cannot be assigned to the same node). Elasticsearch automatically tries to balance shards evenly across the available nodes. When a single node is restarted, by default, the allocation to other nodes is delayed for 1 minute to allow the node to re-join the cluster and thereby reuse the on-disk copy of data that it has (recovering any additional operations that where indexed while the node was down).
Analyzed data is also stored in the shard. Each shard is a lucene index that stores the _source as well as the data structures to do fast searches on the shard.
The rolling restart procedure is described here.

Notice that newer versions of Elasticsearch (7.4+) has improved the peer recovery mechanism to ensure that we can reuse the on-disk state of restarting nodes in more situations.

vijay_kaali · December 27, 2019, 9:30am

Thanks a lot for quick reply .

Point #2, #3 got cleared .
Regarding point 1. Shard is allocated at the the creation of index. it is fixed one .
why it is assigning shard during restart if there is no index creation happens ?
is allocation and assignment different ?

Sorry for my ignorance

HenningAndersen · December 27, 2019, 1:37pm

Hi @vijay_kaali,

assuming you have at least one replica of the shard (i.e., 2 copies), when a node is stopped, Elasticsearch will try to allocate the shard somewhere else in order to maintain availability. Using delayed allocation, it will wait a while, hoping that the node that had a copy of the data comes back online. If it does, it will allocate it to that node. But if the delayed allocation timer times out, it will be allocated to another node (typically requiring a full recovery of all data from the primary copy of the shard).

We call all of those situations allocation (create index allocates an empty primary and 0 or more replica copies), a node that leaves the cluster can cause an allocation (if not using delayed allocation, shard will be allocated to another node), a node joining can also cause an allocation (if it has a copy and allocation is delayed) etc. Allocation assigns a shard to a node.

system · January 24, 2020, 1:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard reallocation after rolling restart Elasticsearch	3	923	June 30, 2017
How to assign correctly shards when a new node join cluster? Elasticsearch	4	594	February 29, 2020
General question shard initialization in cluster Elasticsearch	1	324	July 6, 2017
Assigning Shards in ElasticSearch Cluster Elasticsearch	2	260	March 23, 2022
Restarting node takes time Elasticsearch	4	1078	July 5, 2017

Clarification required on shard

Related Topics