Unable to acquire permit to use snapshot files during recovery

yuswanul · September 8, 2022, 2:47am

Hello there,

i want to ask something about this. i haven't made any changes and i just wanted to check my cluster but suddenly one of my data node was offline and my cluster state from green to yellow. i found much log like this in the log file of offline data node:

[2022-09-08T07:19:54,303][WARN ][o.e.i.r.RecoverySettings ] [data-17] Unable to acquire permit to use snapshot files during recovery, this recovery will recover index files from the source node. Ensure snapshot files can be used during recovery by setting [indices.recovery.max_concurrent_snapshot_file_downloads] to be no greater than [25]

and i found this too:

[2022-09-08T07:09:54,161][WARN ][o.e.g.PersistedClusterStateService] [data-17] writing cluster state took [279543ms] which is above the warn threshold of [10s]; wrote global metadata [false] and metadata for [1] indices and skipped [1200] unchanged indices
[2022-09-08T07:10:59,305][WARN ][o.e.m.f.FsHealthService ] [data-17] health check of [/elasticsearch/elasticsearch-7.17.0/nodes/0] took [82137ms] which is above the warn threshold of [5s]
[2022-09-08T07:11:54,412][INFO ][o.e.c.c.Coordinator ] [data-17] [3] consecutive checks of the master node [{master-1}{_MS9jkxdTv2wCscd0gFmyw}{ogzG8SjVQ0iz65K0VbSPvA}{10.37.187.31}{10.37.187.31:9300}{imrt}] were unsuccessful ([3] rejected, [0] timed out), restarting discovery; more details may be available in the master node logs [last unsuccessful check: rejecting check since [{data-17}{6rXNZDZgRiiu3TPBD1jncQ}{YTHMD4i4S3eYkjaTzPDRdg}{10.37.187.50}{10.37.187.50:9300}{dilrt}] has been removed from the cluster]

when i take a look at one of my master node log, i found this log related to offline data node:

[2022-09-08T07:05:13,915][WARN ][o.e.c.InternalClusterInfoService] [master-1] failed to retrieve stats for node [6rXNZDZgRiiu3TPBD1jncQ]: [data-17][10.37.187.50:9300][cluster:monitor/nodes/stats[n]] request_id [109962722] timed out after [15008ms]

[2022-09-08T07:05:13,927][WARN ][o.e.c.InternalClusterInfoService] [master-1] failed to retrieve shard stats from node [6rXNZDZgRiiu3TPBD1jncQ]: [data-17][10.37.187.50:9300][indices:monitor/stats[n]] request_id [109962729] timed out after [15008ms]

do you think this is caused by network issue? Or this could happen because of other issues such as overhead or something?

your response will be very helpful. Thanks

warkolm · September 8, 2022, 3:06am

Are you able to please post a bit more logs, the context around these entries might be helpful.

yuswanul · September 9, 2022, 12:57am

i found this type of log:

> [2022-09-08T07:13:34,279][WARN ][o.e.a.b.TransportShardBulkAction] [data-17] [[metrics.ocp4-project.prod-esb-2022.09.07][0]] failed to perform indices:data/write/bulk[s] on replica [metrics.ocp4-project.prod-esb-2022.09.07][0], node[XLPmqmRLS8ePzj5cyeGoZQ], [R], s[STARTED], a[id=_vNoR2e2TCaSCPapp_qwrg]
> [2022-09-08T07:13:34,303][WARN ][o.e.a.b.TransportShardBulkAction] [data-17] [[metrics.ocp4-project.prod-esb-2022.09.07][0]] failed to perform indices:data/write/bulk[s] on replica [metrics.ocp4-project.prod-esb-2022.09.07][0], node[XLPmqmRLS8ePzj5cyeGoZQ], [R], s[STARTED], a[id=_vNoR2e2TCaSCPapp_qwrg]

and some of this:

[2022-09-08T07:13:14,446][WARN ][o.e.c.c.ClusterFormationFailureHelper] [data-17] master not discovered yet: have discovered [{data-17}{6rXNZDZgRiiu3TPBD1jncQ}{YTHMD4i4S3eYkjaTzPDRdg}{10.37.187.50}{10.37.187.50:9300}{dilrt}, {master-3}{Ax3huB15R_qNFDvGp-7Jzg}{E4NV2dFoQ-a8iiknwcodbw}{10.37.187.33}{10.37.187.33:9300}{imrt}, {master-1}{_MS9jkxdTv2wCscd0gFmyw}{ogzG8SjVQ0iz65K0VbSPvA}{10.37.187.31}{10.37.187.31:9300}{imrt}, {master-2}{eZwiy4LjSm6-C62fEplTyg}{O4ewI_ytRuu3j1a_1W0YBQ}{10.37.187.32}{10.37.187.32:9300}{imrt}]; discovery will continue using [10.37.187.31:9300, 10.37.187.32:9300, 10.37.187.33:9300, 10.37.187.34:9300, 10.37.187.35:9300, 10.37.187.36:9300, 10.37.187.37:9300, 10.37.187.38:9300, 10.37.187.39:9300, 10.37.187.40:9300, 10.37.187.41:9300, 10.37.187.42:9300, 10.37.187.43:9300, 10.37.187.44:9300, 10.37.187.45:9300, 10.37.187.46:9300, 10.37.187.47:9300, 10.37.187.48:9300, 10.37.187.49:9300, 10.37.187.51:9300, 10.37.187.52:9300, 10.37.187.53:9300, 10.37.187.64:9300, 10.37.187.65:9300] from hosts providers and [{master-3}{Ax3huB15R_qNFDvGp-7Jzg}{E4NV2dFoQ-a8iiknwcodbw}{10.37.187.33}{10.37.187.33:9300}{imrt}, {master-1}{_MS9jkxdTv2wCscd0gFmyw}{ogzG8SjVQ0iz65K0VbSPvA}{10.37.187.31}{10.37.187.31:9300}{imrt}, {master-2}{eZwiy4LjSm6-C62fEplTyg}{O4ewI_ytRuu3j1a_1W0YBQ}{10.37.187.32}{10.37.187.32:9300}{imrt}] from last-known cluster state; node term 35, last-accepted version 762458 in term 35

warkolm · September 9, 2022, 2:51am

That would be why.

Again, we need to see more of your logs and not just snippets please.

yuswanul · September 14, 2022, 8:48am

Here are the full log when the issue occurs

yuswanul · September 14, 2022, 11:16am

what do you think?

cheshirecat · September 14, 2022, 11:31am

400 Link does not exist

I don't know what to think about it...
Please use built-in </>.

yuswanul · September 15, 2022, 12:11am

You can download Here

cheshirecat · September 15, 2022, 6:47am

Nobody will download it. Please use our built-in </>.

yuswanul · September 15, 2022, 7:40am

i can't send it. because it too large. i have used built-in

yuswanul · September 16, 2022, 12:14pm

anyone can help me? just download the file, it's really safe. i don't put some virus inside it. trust me, i just need help

system · October 14, 2022, 12:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch snapshots fail everyday Elasticsearch slm-snapshot-lifecycle-management , snapshot-and-restore	6	698	July 12, 2023
Taking a snapshot causes nodes to fall out of the cluster Elasticsearch	4	491	May 3, 2022
Warning "Unable to acquire permit to use snapshot files during recovery…" when restarting elasticsearch node Elasticsearch	3	3496	February 22, 2022
Cannot allocate because allocation is not permitted to any of the nodes Elasticsearch	6	14162	July 26, 2017
Snapshot restore is very slow to get started Elasticsearch	13	3219	April 29, 2020

Unable to acquire permit to use snapshot files during recovery

Related topics