I have basic doubt on Shard concept.
my understanding // Shard is physical data segment wherein indexes are spread across .
and multiple shard is for performance //
Question is ,
if Shard is physical segment concept, then what is the process of assigning shard after restart.
if the index documents stored in shards , where tokeniser get stored
in a rolling restart it takes long time to assign shards , is there a way , reduce/avoid them
The assignment of shards is called allocation in Elasticsearch. There are a number of settings to guide how Elasticsearch allocates shards to nodes as well as harder rules (for instance that replicas of the same shard cannot be assigned to the same node). Elasticsearch automatically tries to balance shards evenly across the available nodes. When a single node is restarted, by default, the allocation to other nodes is delayed for 1 minute to allow the node to re-join the cluster and thereby reuse the on-disk copy of data that it has (recovering any additional operations that where indexed while the node was down).
Analyzed data is also stored in the shard. Each shard is a lucene index that stores the _source as well as the data structures to do fast searches on the shard.
Notice that newer versions of Elasticsearch (7.4+) has improved the peer recovery mechanism to ensure that we can reuse the on-disk state of restarting nodes in more situations.
Point #2, #3 got cleared .
Regarding point 1. Shard is allocated at the the creation of index. it is fixed one .
why it is assigning shard during restart if there is no index creation happens ?
is allocation and assignment different ?
assuming you have at least one replica of the shard (i.e., 2 copies), when a node is stopped, Elasticsearch will try to allocate the shard somewhere else in order to maintain availability. Using delayed allocation, it will wait a while, hoping that the node that had a copy of the data comes back online. If it does, it will allocate it to that node. But if the delayed allocation timer times out, it will be allocated to another node (typically requiring a full recovery of all data from the primary copy of the shard).
We call all of those situations allocation (create index allocates an empty primary and 0 or more replica copies), a node that leaves the cluster can cause an allocation (if not using delayed allocation, shard will be allocated to another node), a node joining can also cause an allocation (if it has a copy and allocation is delayed) etc. Allocation assigns a shard to a node.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.