Shard Reallocation While Indexing

nchalise · August 6, 2025, 4:58am

We have 20 indexes with 490 shards. The shards size are vary from 35GB to 180GB and the ES cluster size is 35 data nodes. Each data node has 5 disk each disk has size 500GB. While indexing we noticed that shards are starting reallocation. ES version used was 9.0 and total size of these 20 indexes are 55TB. Other ES setting are default one.
Here, My query are: Why shard reallocation were performed at the time of indexing?Is it due to rebalancing the disk space?
Is data were lost if we perform indexing at the time of shard reallocation?
How to avoid shard reallocation at the time of data indexing?

RainTown · August 6, 2025, 7:50am

There was a somewhat similar recent thread here which contains some useful diagnostic tips/info, as well as linking to other threads.

As in that thread, might be useful to share output of

GET /_cat/nodes?v&h=name,role,disk.used_percent,disk.used,disk.avail&s=role

If at any instant there was no indexing ongoing how can the cluster know there will be no indexing in the next instant/second/minute/whatever? I guess if your indexing only happens at very specific and predictable times, you could code something, but ... seems wrong. Is the reallocation actually causing you / your clients some actual issue? Or is it that it's just unexpected. And, is your cluster stable, nodes are not semi-frequently leaving/re-joining the cluster?

DavidTurner · August 6, 2025, 7:59am

No.

There's no need to avoid this.

nchalise · August 6, 2025, 8:40am

Thanks for the response.
Here, we also observed disk water mark issue along with shard reallocation.So,If node is in read-only mode (e.g., due to disk watermark), writes will silently fail without any notification while data indexing?

Christian_Dahlqvist · August 6, 2025, 8:46am

Are you using the default watermark settings or have you customised this in any way?

nchalise · August 6, 2025, 8:50am

Using default watermark settings.

DavidTurner · August 6, 2025, 9:07am

What did you observe exactly? Did a node exceed the high watermark briefly? If so, this is described in the docs as normal behaviour:

It is normal for nodes to temporarily exceed the high watermark from time to time.

nchalise · August 6, 2025, 12:28pm

Each node in our cluster is configured with five disks per node, each sized at 500GB—giving a total of 2.5TB per node. With Elasticsearch’s default disk watermarks enabled, does the system evaluate disk usage individually per disk, or does it consider the combined disk space across the node when triggering watermark-related error?

RainTown · August 6, 2025, 12:54pm

the 5 disk's partitions are each mounted at a different mount points, and you are using Multiple Data Paths ? i.e. you have an entry like

path.data: /mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4,/mnt/data5

in your nodes' elasticsearch.yml ?

Can you also share output of

GET /_cat/nodes?v&h=name,role,disk.used_percent,disk.used,disk.avail&s=role

and

GET /_cluster/settings?include_defaults=true&filter_path=**.disk.watermark.**

Christian_Dahlqvist · August 6, 2025, 1:04pm

Which version of Elasticsearch are you using?

nchalise · August 6, 2025, 3:10pm

Yes, it is like path.data: /mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4,/mnt/data5

Cluster has been terminated so no other data for _cluster and _cat available.

nchalise · August 6, 2025, 3:10pm

Which version of Elasticsearch are you using?
9.0.0

Christian_Dahlqvist · August 6, 2025, 3:18pm

According to the docs specifying multiple datra paths the way you have done is deprecated:

Elasticsearch offers a deprecated setting that allows you to specify multiple paths in path.data . To learn about this setting, and how to migrate away from it, refer to Multiple data paths.

As far as I know Elasticsearch is unable to move shards between data paths on the same host, which may complicate reallocation. I would recommend changing this as outlined in the docs I linked to.

RainTown · August 6, 2025, 3:38pm

OK, thank you. That mechanism is being deprecated. See the docs.

For a bunch of reasons you would likely be better served by using LVM tools to collect your 5-disks-per-node into a single filesystem and let the operating system/filesystem manage the space. If your system was working this is a lengthy process, 35 nodes, but doable. Personally, old school, but 5x35 = 165 disks is a bit too many for me, so I'd try to use some sort of RAID, software RAID if necessary. But

Err, not sure how to interpret "terminated". You mean crashed, not currently working, unable to get working, ... ? You are looking for assistance to get it working again ?

linkerc · August 6, 2025, 6:29pm

shard size of 180GB? Isn't that way too large?
In addition, your disk size is only 500GB each. One such shard would occupy 35% of the storage already. It would probably throw off any rebalancing algorithm.

I believe the recommendation is below 30GB. A shard is equate to a file. A file of 30GB is kind of big already.
It makes moving and recovery longer.

I have not experienced any data missing due to rebalancing yet. If your concern is theoretical, then I would probably not worry about it.

Topic		Replies	Views
Disk awarnes on Indexing Elasticsearch	9	387	July 6, 2017
Shard allocation logic not taking disk size into account - why? Elasticsearch	12	972	July 6, 2017
Shard reallocation and disk space Elasticsearch	5	953	August 4, 2020
When does ElasticSearch reallocate shards between nodes? Elasticsearch	6	2146	July 6, 2017
Shard relocation storms when cluster disk low Elasticsearch	11	2602	July 24, 2018

Shard Reallocation While Indexing

Related topics