ILM turn unstable the cluster. Heap Overflow

Emiliano_Baum · October 25, 2019, 11:15am

when i try to execute:

curl -X POST "xx.xx.xxx.xx:9200/_ilm/stop?pretty

the result is:
"type" : "process_cluster_event_timeout_exception"

I think the ILM operation schedule generate an overflow heap error:

[2019-10-25T00:53:10,313][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [CGSS-CLUSTER01-MASTER-1] uncaught exception in thread [elasticsearch[CGSS-CL
USTER01-MASTER-1][management][T#1]]
java.lang.IllegalStateException: expected index [quorum01-20180202] with policy [time_close] to have current step consistent with provided step key (null)
but it was {"phase":"new","action":"init","name":"init"}
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleRunner.maybeRunAsyncAction(IndexLifecycleRunner.java:167) ~[?:?]
at org.elasticsearch.xpack.indexlifecycle.IndexLifecycleService.onMaster(IndexLifecycleService.java:123) ~[?:?]
at org.elasticsearch.cluster.service.ClusterApplierService$OnMasterRunnable.run(ClusterApplierService.java:616) ~[elasticsearch-6.6.0.jar:6.6.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:660) ~[elasticsearch-6.6.0.jar:6.6.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_31]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_31]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31]
[2019-10-25T00:53:10,754][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [CGSS-CLUSTER01-MASTER-1] timed out while retrying [indices:admin/create] after fa
ilure (timeout [30s])
[2019-10-25T00:53:10,755][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [CGSS-CLUSTER01-MASTER-1] timed out while retrying [indices:admin/create] after fa
ilure (timeout [30s])

Any recommendation? Before i configure ILM, i use to run my own program in python to close old indexes using a time mask and i never had any problems with the cluster.

Emiliano_Baum · October 29, 2019, 2:28pm

Finally i modified the index settings using curl:

curl -X PUT "**.**.***.***:9200/****-20191025/_settings?pretty" -H 'Content-Type: application/json' -d'{"index":{"lifecycle":{"name":""}}}'

Then i enable allocation for primary and then for all...
The index life cycle managment, used only configuration for cold phase, closing index older than 30 days generates a heap overflow. I saw on the master nodes the use of heap memory is not balanced, and the ilm process generate the overflow and kill master nodes until reach the limit established at the .yml config file.

Probably is not a very "clean" solution, but works... the cluster is now full operative.

system · November 26, 2019, 2:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index rollover and ILM issue Elasticsearch ilm-index-lifecycle-management	12	252	March 4, 2025
Please help to understand these Exceptions Elasticsearch	9	2259	July 6, 2017
ILM action failed "check-rollover-ready,Moving to ERROR step"? Elasticsearch ilm-index-lifecycle-management	5	3763	April 7, 2022
Memory spikes on ILM transition Elasticsearch ilm-index-lifecycle-management	8	206	January 8, 2024
Cluster Yellow - Heap size overload Elasticsearch	2	435	October 9, 2020

ILM turn unstable the cluster. Heap Overflow

Related topics