Language identification making Elasticsearch cluster unresponsive?

Pyppe · January 14, 2022, 11:36am

We've got a dockerized 7.16.2 Elaticsearch server running. E.g. /_cat/indices and /_cat/nodes endpoints hang out and never give a response.

In logs we have warnings such as:

{"type": "server", "timestamp": "2022-01-14T11:24:00,373Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve shard stats from node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][indices:monitor/stats[n]] request_id [36754] timed out after [15022ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:45,378Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve stats for node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][cluster:monitor/nodes/stats[n]] request_id [36775] timed out after [15013ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:45,380Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve shard stats from node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][indices:monitor/stats[n]] request_id [36776] timed out after [15013ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:51,836Z", "level": "WARN", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "Received response for a request that has timed out, sent [12.3m/741466ms] ago, timed out [12.1m/726453ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}], id [15653]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:51,840Z", "level": "WARN", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "Received response for a request that has timed out, sent [12.3m/741466ms] ago, timed out [12.1m/726453ms] ago, action [indices:monitor/stats[n]], node [{node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}], id [15654]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }

And in hot threads we have:

::: {node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}
   Hot threads at 2022-01-14T11:30:22.710Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   47.5% [cpu=47.5%, other=0.0%] (237.2ms out of 500ms) cpu usage by thread 'elasticsearch[node01][management][T#3]'
     10/10 snapshots sharing following 35 elements
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.loadModelIfNecessary(ModelLoadingService.java:276)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModel(ModelLoadingService.java:253)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModelForPipeline(ModelLoadingService.java:188)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.lambda$doExecute$7(TransportInternalInferModelAction.java:92)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction$$Lambda$6906/0x00000008019bd250.accept(Unknown Source)
       app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:532)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:520)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:86)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:31)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179)
       app//org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:145)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:154)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:82)
       app//org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:95)
       app//org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:73)
       app//org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407)
       org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:148)
       org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor.execute(InferenceProcessor.java:113)
       app//org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136)
       app//org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:122)
       app//org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:117)
       app//org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:823)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.lambda$execute$3(SimulateExecutionService.java:81)
       app//org.elasticsearch.action.ingest.SimulateExecutionService$$Lambda$6895/0x0000000801b668a0.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@17.0.1/java.lang.Thread.run(Thread.java:833)
   
   46.0% [cpu=46.0%, other=0.0%] (229.8ms out of 500ms) cpu usage by thread 'elasticsearch[node01][management][T#4]'
     10/10 snapshots sharing following 35 elements
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.loadModelIfNecessary(ModelLoadingService.java:276)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModel(ModelLoadingService.java:253)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModelForPipeline(ModelLoadingService.java:188)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.lambda$doExecute$7(TransportInternalInferModelAction.java:92)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction$$Lambda$6906/0x00000008019bd250.accept(Unknown Source)
       app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:532)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:520)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:86)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:31)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179)
       app//org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:145)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:154)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:82)
       app//org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:95)
       app//org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:73)
       app//org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407)
       org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:148)
       org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor.execute(InferenceProcessor.java:113)
       app//org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136)
       app//org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:122)
       app//org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:117)
       app//org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:823)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.lambda$execute$3(SimulateExecutionService.java:81)
       app//org.elasticsearch.action.ingest.SimulateExecutionService$$Lambda$6895/0x0000000801b668a0.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@17.0.1/java.lang.Thread.run(Thread.java:833)

The hot threads mention ML models, and we are using /_ingest/pipeline/_simulate endpoint (Language identification | Machine Learning in the Elastic Stack [7.16] | Elastic) quite a lot... Is the unresponsiveness caused perhaps by the language identification? Or what exactly is going on, any ideas?

system · February 11, 2022, 11:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Debug indexing time Elasticsearch	3	140	April 25, 2024
Occasional Index Management timeout Elasticsearch	11	1382	July 1, 2021
Interesting Issue Elasticsearch	7	441	July 6, 2017
Elastic Unstable Elasticsearch	17	377	February 8, 2024
Elasticsearch perform slow process intermittently Elasticsearch	13	677	December 28, 2020

Language identification making Elasticsearch cluster unresponsive?

Related topics