Language identification making Elasticsearch cluster unresponsive?

We've got a dockerized 7.16.2 Elaticsearch server running. E.g. /_cat/indices and /_cat/nodes endpoints hang out and never give a response.

In logs we have warnings such as:

{"type": "server", "timestamp": "2022-01-14T11:24:00,373Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve shard stats from node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][indices:monitor/stats[n]] request_id [36754] timed out after [15022ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:45,378Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve stats for node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][cluster:monitor/nodes/stats[n]] request_id [36775] timed out after [15013ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:45,380Z", "level": "WARN", "component": "o.e.c.InternalClusterInfoService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "failed to retrieve shard stats from node [HfisA0IYRRqHqvjD0SRseQ]: [node01][127.1.2.3:9300][indices:monitor/stats[n]] request_id [36776] timed out after [15013ms]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:51,836Z", "level": "WARN", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "Received response for a request that has timed out, sent [12.3m/741466ms] ago, timed out [12.1m/726453ms] ago, action [cluster:monitor/nodes/stats[n]], node [{node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}], id [15653]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }
{"type": "server", "timestamp": "2022-01-14T11:24:51,840Z", "level": "WARN", "component": "o.e.t.TransportService", "cluster.name": "docker-cluster", "node.name": "node01", "message": "Received response for a request that has timed out, sent [12.3m/741466ms] ago, timed out [12.1m/726453ms] ago, action [indices:monitor/stats[n]], node [{node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}], id [15654]", "cluster.uuid": "wEZgaKblQeWmXf2ruV99tQ", "node.id": "HfisA0IYRRqHqvjD0SRseQ"  }

And in hot threads we have:

::: {node01}{HfisA0IYRRqHqvjD0SRseQ}{1qR01b4QQwSpwy8CFO9WQw}{127.1.2.3}{127.1.2.3:9300}{cdfhilmrstw}{ml.machine_memory=16395165696, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=7516192768}
   Hot threads at 2022-01-14T11:30:22.710Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   47.5% [cpu=47.5%, other=0.0%] (237.2ms out of 500ms) cpu usage by thread 'elasticsearch[node01][management][T#3]'
     10/10 snapshots sharing following 35 elements
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.loadModelIfNecessary(ModelLoadingService.java:276)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModel(ModelLoadingService.java:253)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModelForPipeline(ModelLoadingService.java:188)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.lambda$doExecute$7(TransportInternalInferModelAction.java:92)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction$$Lambda$6906/0x00000008019bd250.accept(Unknown Source)
       app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:532)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:520)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:86)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:31)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179)
       app//org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:145)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:154)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:82)
       app//org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:95)
       app//org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:73)
       app//org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407)
       org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:148)
       org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor.execute(InferenceProcessor.java:113)
       app//org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136)
       app//org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:122)
       app//org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:117)
       app//org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:823)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.lambda$execute$3(SimulateExecutionService.java:81)
       app//org.elasticsearch.action.ingest.SimulateExecutionService$$Lambda$6895/0x0000000801b668a0.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@17.0.1/java.lang.Thread.run(Thread.java:833)
   
   46.0% [cpu=46.0%, other=0.0%] (229.8ms out of 500ms) cpu usage by thread 'elasticsearch[node01][management][T#4]'
     10/10 snapshots sharing following 35 elements
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.loadModelIfNecessary(ModelLoadingService.java:276)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModel(ModelLoadingService.java:253)
       org.elasticsearch.xpack.ml.inference.loadingservice.ModelLoadingService.getModelForPipeline(ModelLoadingService.java:188)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.lambda$doExecute$7(TransportInternalInferModelAction.java:92)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction$$Lambda$6906/0x00000008019bd250.accept(Unknown Source)
       app//org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:136)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:532)
       org.elasticsearch.xpack.ml.inference.persistence.TrainedModelProvider.getTrainedModel(TrainedModelProvider.java:520)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:86)
       org.elasticsearch.xpack.ml.action.TransportInternalInferModelAction.doExecute(TransportInternalInferModelAction.java:31)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:179)
       app//org.elasticsearch.action.support.ActionFilter$Simple.apply(ActionFilter.java:53)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.apply(SecurityActionFilter.java:145)
       app//org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:177)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:154)
       app//org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:82)
       app//org.elasticsearch.client.node.NodeClient.executeLocally(NodeClient.java:95)
       app//org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:73)
       app//org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:407)
       org.elasticsearch.xpack.core.ClientHelper.executeAsyncWithOrigin(ClientHelper.java:148)
       org.elasticsearch.xpack.ml.inference.ingest.InferenceProcessor.execute(InferenceProcessor.java:113)
       app//org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:136)
       app//org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:122)
       app//org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:117)
       app//org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:823)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.executeDocument(SimulateExecutionService.java:56)
       app//org.elasticsearch.action.ingest.SimulateExecutionService.lambda$execute$3(SimulateExecutionService.java:81)
       app//org.elasticsearch.action.ingest.SimulateExecutionService$$Lambda$6895/0x0000000801b668a0.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@17.0.1/java.lang.Thread.run(Thread.java:833)

The hot threads mention ML models, and we are using /_ingest/pipeline/_simulate endpoint (Language identification | Machine Learning in the Elastic Stack [7.16] | Elastic) quite a lot... Is the unresponsiveness caused perhaps by the language identification? Or what exactly is going on, any ideas?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.