How does Elasticsearch balance data across nodes in a cluster

animageofmine · February 20, 2019, 8:27pm

What Algorithm does Elasticsearch use to uniformly distribute data across different nodes in the cluster? How does it deal with new nodes / dead nodes?

Does it use disk space, consistent hashing? Any resources that you can share would be greatly appreciated. Thank you.

DavidTurner · February 20, 2019, 9:52pm

Data is split across shards by hashing the document ID, dividing the hash by the number of shards and taking the remainder.

Shards are allocated to nodes taking a number of factors into account, including disk space. The reference manual has all the details.

New nodes and dead nodes aren't treated particularly specially by the algorithm. Allocation decisions are made assuming that the membership of the cluster is fixed. When an empty node joins the cluster Elasticsearch will relocate some data onto it so that each node holds roughly the same number of shards. If a node fails then the shards it held are distributed among the remaining nodes, although there's a short delay before doing anything in case it comes back.

animageofmine · February 21, 2019, 8:08pm

Thank you David.

Which documentID is this? The only documentID I am familiar in ES is the actual index document ID. Also, how can you divide the hash (It has hex alphabets and numbers).

How does ES decide which shards to relocate?

DavidTurner · February 21, 2019, 9:40pm

Yes, that's the one.

The hash used in Elasticsearch is a 32-bit number. The hashes that you are thinking of, containing digits and letters, are still numbers; they are often written in hexadecimal so they're shorter and because computers can deal with numbers more efficiently when they're written in hexadecimal than when they're in decimal (i.e. only using digits).

It picks some of the shards that satisfy the constraints (disk space, allocation filtering, etc.) measuring approximately how "balanced" the cluster is and looking for relocations that improve its balance the most.

system · March 21, 2019, 9:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard rebalance issue on addition of new data nodes across the cluster Elasticsearch	6	1561	November 24, 2017
Unbalanced disk usage with ES 2.4.x Elasticsearch	8	1515	December 27, 2017
How data is distributed in shards Elasticsearch	5	6099	February 5, 2017
Problem elastic. uneven distribution of the data node Elasticsearch	14	714	April 9, 2024
Data balancing and backup in the cluster Elasticsearch	4	320	July 6, 2017

How does Elasticsearch balance data across nodes in a cluster

Related Topics