Elasticsearch 6.3: snapshot rate limiting not working

I have a 23 node elasticsearch cluster running 6.3.2.

I have 20 data only nodes and 3 master only nodes. All the hosts are running centos 7. There is a dedicated network vlan that has been setup for the elasticsearch cluster traffic ( ie port :9300 traffic).

I'm trying to snapshot an elasticsearch index and elasticsearch seems to be ignoring the max_snapshot_bytes_per_sec setting. I'm attempting to backup to a shared nfs repository on all of the elasticsearch nodes. I should be getting the default limit of 40 mb/s. However, the snapshotting is maxing out my 1g network link and doing 120 mb/s. If i try to explicitly set the limit to 40 mb/s. The setting is ignored and continues to snapshot at 120 mb/s. As a result of this the snapshot jobs overload the index/cluster and hang the index causing the shards for the index to get stuck in an initializing state until all the nodes that have shards for the index can be restarted.

the curl cmd I'm using to initite the snapshot job:

curl -s -XPUT 'http://localhost:9200/_snapshot/zfsnas/snapshot_1?wait_for_completion=true' -d '{"indices": "index_1,index_2"}'

If i watch the du output on the repository directory I see there is about 6-7gb/min of traffic. and after about 2-3mins the index has unassigned shards and the nodes on which the shards were running are no longer able to initialize. I have to restart the elasticsearch service to fix it (after canceling the snapshot job).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.