We have 20 indexes with 490 shards. The shards size are vary from 35GB to 180GB and the ES cluster size is 35 data nodes. Each data node has 5 disk each disk has size 500GB. While indexing we noticed that shards are starting reallocation. ES version used was 9.0 and total size of these 20 indexes are 55TB. Other ES setting are default one.
Here, My query are: Why shard reallocation were performed at the time of indexing?Is it due to rebalancing the disk space?
Is data were lost if we perform indexing at the time of shard reallocation?
How to avoid shard reallocation at the time of data indexing?
There was a somewhat similar recent thread here which contains some useful diagnostic tips/info, as well as linking to other threads.
As in that thread, might be useful to share output of
GET /_cat/nodes?v&h=name,role,disk.used_percent,disk.used,disk.avail&s=role
If at any instant there was no indexing ongoing how can the cluster know there will be no indexing in the next instant/second/minute/whatever? I guess if your indexing only happens at very specific and predictable times, you could code something, but ... seems wrong. Is the reallocation actually causing you / your clients some actual issue? Or is it that it's just unexpected. And, is your cluster stable, nodes are not semi-frequently leaving/re-joining the cluster?
No.
There's no need to avoid this.
Thanks for the response.
Here, we also observed disk water mark issue along with shard reallocation.So,If node is in read-only mode (e.g., due to disk watermark), writes will silently fail without any notification while data indexing?
Are you using the default watermark settings or have you customised this in any way?
Using default watermark settings.
What did you observe exactly? Did a node exceed the high watermark briefly? If so, this is described in the docs as normal behaviour:
It is normal for nodes to temporarily exceed the high watermark from time to time.
Each node in our cluster is configured with five disks per node, each sized at 500GB—giving a total of 2.5TB per node. With Elasticsearch’s default disk watermarks enabled, does the system evaluate disk usage individually per disk, or does it consider the combined disk space across the node when triggering watermark-related error?
the 5 disk's partitions are each mounted at a different mount points, and you are using Multiple Data Paths ? i.e. you have an entry like
path.data: /mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4,/mnt/data5
in your nodes' elasticsearch.yml ?
Can you also share output of
GET /_cat/nodes?v&h=name,role,disk.used_percent,disk.used,disk.avail&s=role
and
GET /_cluster/settings?include_defaults=true&filter_path=**.disk.watermark.**
Which version of Elasticsearch are you using?
Yes, it is like path.data: /mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4,/mnt/data5
Cluster has been terminated so no other data for _cluster and _cat available.
Which version of Elasticsearch are you using?
9.0.0
According to the docs specifying multiple datra paths the way you have done is deprecated:
Elasticsearch offers a deprecated setting that allows you to specify multiple paths in
path.data
. To learn about this setting, and how to migrate away from it, refer to Multiple data paths.
As far as I know Elasticsearch is unable to move shards between data paths on the same host, which may complicate reallocation. I would recommend changing this as outlined in the docs I linked to.
OK, thank you. That mechanism is being deprecated. See the docs.
For a bunch of reasons you would likely be better served by using LVM tools to collect your 5-disks-per-node into a single filesystem and let the operating system/filesystem manage the space. If your system was working this is a lengthy process, 35 nodes, but doable. Personally, old school, but 5x35 = 165 disks is a bit too many for me, so I'd try to use some sort of RAID, software RAID if necessary. But
Err, not sure how to interpret "terminated". You mean crashed, not currently working, unable to get working, ... ? You are looking for assistance to get it working again ?
shard size of 180GB? Isn't that way too large?
In addition, your disk size is only 500GB each. One such shard would occupy 35% of the storage already. It would probably throw off any rebalancing algorithm.
I believe the recommendation is below 30GB. A shard is equate to a file. A file of 30GB is kind of big already.
It makes moving and recovery longer.
I have not experienced any data missing due to rebalancing yet. If your concern is theoretical, then I would probably not worry about it.