Shard activity always happening in ElasticSearch 8.12 version

Amit_Shukla · August 14, 2024, 12:32pm

We have upgraded Elasticsearch to 8.12 and changed index level settings to have shards per node as 4 , which was earlier 2.
Somehow everytime some shard recovery is happening and shards are moving from one node to another. Cluster is in Green status only.
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"node_concurrent_recoveries": "6",
"exclude": {
"_host": "atl1s11cjbesd36.xt.local,"
}
}
}
},
"indices": {
"breaker": {
"total": {
"limit": "80%"
}
},
"recovery": {
"max_bytes_per_sec": "200mb",
"max_concurrent_file_chunks": "4"
}
},
"logger": {
"org": {
"elasticsearch": "WARN"
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"node_concurrent_recoveries": "8"
}
}
}
}
}

DavidTurner · August 14, 2024, 1:17pm

See the docs for this setting:

Increasing this setting may cause shard movements to have a performance impact on other activity in your cluster, but may not make shard movements complete noticeably sooner. We do not recommend adjusting this setting from its default of 2 .

If you've only just upgraded then you will need to allow some time for your cluster to adjust itself to the new balancing heuristics. Depending on cluster size this could be several days of work (or more).

See also this troubleshooting guide.

Amit_Shukla · August 14, 2024, 3:38pm

Thanks @DavidTurner !
I ran this API GET _internal/desired_balance and i got below response for cluster.
“node_is_desired” false - 3322
“node_is_desired” true - 6684
Does it look OK?
Also. is this rebalance logic introduced in 8.6 going to impact cluster performance in big way or there has been some benchmarking on this?

DavidTurner · August 14, 2024, 4:15pm

I expect that to improve over time.

Yes generally it will improve performance.

Amit_Shukla · August 15, 2024, 5:43am

Thanks @DavidTurner !
Is there any metric change which suggests things are stabilising now?
Like change in “node_is_desired” metric , will “node_is_desired” false -decrease over time?

DavidTurner · August 15, 2024, 6:27am

Yes the node_is_desired flags should all end up true.

Amit_Shukla · August 23, 2024, 2:28pm

Thanks @DavidTurner !
I see that shard recoveries after version upgrade impacts indexing rate, esp when bulk indexing is being done which goes to shards on different nodes. Will "cluster.routing.allocation.balance.threshold" help in this case?

DavidTurner · August 23, 2024, 3:54pm

Probably, but I think the more likely problem is that you have set indices.recovery.max_bytes_per_sec: 200mb but your hardware can't cope with that. See these docs for more info:

If this limit is too high, ongoing recoveries may consume an excess of bandwidth and other resources, which can have a performance impact on your cluster and in extreme cases may destabilize it.

Topic		Replies	Views
Shard rebalancing on Elasticsearch 1.7.1 Elasticsearch	5	1441	July 5, 2017
Is it me or is ES 1.6.0 node startup/recovery slower then before? Elasticsearch	15	1124	July 6, 2017
Increasing shard relocation speed Elasticsearch	7	29530	July 5, 2017
Trying to optimize configuration for better cluster restart/recovery Elasticsearch	8	657	July 6, 2017
Cluster setting not affected Elasticsearch	1	314	October 17, 2018

Shard activity always happening in ElasticSearch 8.12 version

Related topics