Elasticsearch node crashed

Hello,

I am new to elasticsearch and I'm trying to understand why sometimes our elasticsearch cluster breaks.

We have 3 nodes in the cluster running on Ubuntu 20.04 LTS, with 8 cores CPU and 32 GB of RAM, out of which 15g allocated to elasticsearch app. (-Xms15g -Xmx15g).

elasticsearch.yml config file:

---
discovery.zen.hosts_provider: file
discovery.zen.minimum_master_nodes: 2
http.port: "9200-9300"
network.host: "10.0.6.21"
node.attr.client: "true"
node.data: "true"
node.master: "true"
node.name: es-server-01
path.data: /var/lib/elasticsearch/
path.logs: /var/log/elasticsearch/
transport.tcp.port: "9300-9400"
http.cors.enabled: true
http.cors.allow-origin: "*"
cluster.name: cluster-prod
indices.query.bool.max_clause_count: 10000

Since the log is huge, I'll try to shrink and summarize it.

First, there are some warnings about the garbage collector:

[2022-07-04T14:17:31,094][WARN ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][542872] overhead, spent [4.4s] collecting in the last [4.6s]
[2022-07-04T16:28:23,763][WARN ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][550716] overhead, spent [4.8s] collecting in the last [5.1s]
[2022-07-04T17:11:06,206][WARN ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][553272] overhead, spent [4.6s] collecting in the last [5s]
...
[2022-07-04T18:03:35,546][INFO ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][old][554244][2459] duration [5s], collections [1]/[5.1s], total [5s]/[45.8m], memory [14.6gb]->[14.6gb]/[14.9gb], all_pools {[young] [330.4mb]->[354.6mb]/[532.5mb]}{[survivor] [0b]->[0b]/[66.5mb]}{[old] [14.3gb]->[14.3gb]/[14.3gb]}
[2022-07-04T18:03:35,546][WARN ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][554244] overhead, spent [5s] collecting in the last [5.1s]
...

Then, it complains about some queue being full, right? :

[2022-07-04T18:04:17,848][WARN ][r.suppressed             ] [es-server-01] path: /live_api_device/device/_search, params: {index=live_api_device, type=device}
org.elasticsearch.action.search.SearchPhaseExecutionException: 
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:296) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:91) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.onRejection(AbstractRunnable.java:63) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.onRejection(TimedRunnable.java:50) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onRejection(ThreadContext.java:768) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:104) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.execute(AbstractSearchAsyncAction.java:311) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.FetchSearchPhase.run(FetchSearchPhase.java:80) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:165) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:159) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:259) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase.successfulShardExecution(InitialSearchPhase.java:254) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase.onShardResult(InitialSearchPhase.java:242) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase.access$200(InitialSearchPhase.java:48) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase$2.lambda$innerOnResponse$0(InitialSearchPhase.java:215) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase.maybeFork(InitialSearchPhase.java:174) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase.access$000(InitialSearchPhase.java:48) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.InitialSearchPhase$2.innerOnResponse(InitialSearchPhase.java:215) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:45) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:29) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:68) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:36) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:54) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:454) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1116) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1197) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1177) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:47) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:30) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:360) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.search.SearchService$2.onResponse(SearchService.java:356) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.search.SearchService$4.doRun(SearchService.java:1129) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:41) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:778) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.21.jar:6.8.21]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_312]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_312]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@543e8109 on QueueResizingEsThreadPoolExecutor[name = es-server-01/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 86.6ms, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@57e6901c[Running, pool size = 13, active threads = 13, queued tasks = 1000, completed tasks = 90000591]]
        at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.8.21.jar:6.8.21]
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_312]
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_312]
        at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:98) ~[elasticsearch-6.8.21.jar:6.8.21]
        ... 34 more

The above log goes on and on for couple of hours and then ends up in Out of Memory

[2022-07-04T20:49:23,239][WARN ][o.e.t.ThreadPool         ] [es-server-01] failed to run scheduled task [org.elasticsearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker@61b6c8bf] on thread pool [same]
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:680) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:694) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.getFlushingBytes(IndexWriter.java:578) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.elasticsearch.index.engine.InternalEngine.getWritingBytes(InternalEngine.java:574) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.getWritingBytes(IndexShard.java:980) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.indices.IndexingMemoryController.getShardWritingBytes(IndexingMemoryController.java:182) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.runUnlocked(IndexingMemoryController.java:310) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.indices.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:290) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.threadpool.Scheduler$ReschedulingRunnable.doRun(Scheduler.java:247) [elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.8.21.jar:6.8.21]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_312]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_312]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_312]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_312]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_312]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_312]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_312]
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.packed.PackedLongValues$Builder.<init>(PackedLongValues.java:183) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.DeltaPackedLongValues$Builder.<init>(DeltaPackedLongValues.java:58) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:53) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:58) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.NormValuesWriter.<init>(NormValuesWriter.java:42) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain$PerField.setInvertState(DefaultIndexingChain.java:734) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain$PerField.<init>(DefaultIndexingChain.java:724) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:662) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.elasticsearch.index.engine.InternalEngine.updateDocs(InternalEngine.java:1308) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1120) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:935) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:826) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:793) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:746) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$3(TransportShardBulkAction.java:458) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$3677/1902326889.get(Unknown Source) ~[?:?]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:481) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:456) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:220) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:164) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:156) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:143) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1059) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1037) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:104) ~[elasticsearch-6.8.21.jar:6.8.21]
[2022-07-04T20:49:23,260][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41472}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:28,353][INFO ][o.e.m.j.JvmGcMonitorService] [es-server-01] [gc][557732] overhead, spent [1.2m] collecting in the last [3.9m]
[2022-07-04T20:49:23,258][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41470}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:23,250][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41474}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:28,369][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41470}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:28,286][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=0.0.0.0/0.0.0.0:9300, remoteAddress=/10.0.6.13:41450}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:28,351][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41472}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2022-07-04T20:49:28,347][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [es-server-01] fatal error in thread [elasticsearch[es-server-01][write][T#1]], exiting
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.packed.PackedLongValues$Builder.<init>(PackedLongValues.java:183) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.DeltaPackedLongValues$Builder.<init>(DeltaPackedLongValues.java:58) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:53) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.util.packed.PackedLongValues.deltaPackedBuilder(PackedLongValues.java:58) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.NormValuesWriter.<init>(NormValuesWriter.java:42) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain$PerField.setInvertState(DefaultIndexingChain.java:734) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain$PerField.<init>(DefaultIndexingChain.java:724) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.getOrAddField(DefaultIndexingChain.java:662) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:394) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1616) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1608) ~[lucene-core-7.7.3.jar:7.7.3 1a0d2a901dfec93676b0fe8be425101ceb754b85 - noble - 2020-04-21 10:31:55]
        at org.elasticsearch.index.engine.InternalEngine.updateDocs(InternalEngine.java:1308) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1120) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:935) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:826) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:793) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:746) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.lambda$executeIndexRequestOnPrimary$3(TransportShardBulkAction.java:458) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction$$Lambda$3677/1902326889.get(Unknown Source) ~[?:?]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeOnPrimaryWhileHandlingMappingUpdates(TransportShardBulkAction.java:481) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeIndexRequestOnPrimary(TransportShardBulkAction.java:456) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:220) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:164) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:156) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:143) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:82) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1059) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryShardReference.perform(TransportReplicationAction.java:1037) ~[elasticsearch-6.8.21.jar:6.8.21]
        at org.elasticsearch.action.support.replication.ReplicationOperation.execute(ReplicationOperation.java:104) ~[elasticsearch-6.8.21.jar:6.8.21]
[2022-07-04T20:49:23,259][WARN ][o.e.t.OutboundHandler    ] [es-server-01] send message failed [channel: Netty4TcpChannel{localAddress=/10.0.6.21:9300, remoteAddress=/10.0.6.13:41464}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]

I am monitoring the servers with Nagios and Grafana, and there were no problems with server's resources: no CPU, Memory spikes...nothing..

Thanks! Any help will be appreciated.

It looks like you are suffering from heap pressure and long, slow GC. What is the full output of the cluster stats API?

Calling the stats api now, outputs the following:

{
  "_nodes" : {
    "total" : 3,
    "successful" : 3,
    "failed" : 0
  },
  "cluster_name" : "cluster-prod",
  "cluster_uuid" : "D11RC5<redacted>Bw",
  "timestamp" : 1657015784259,
  "status" : "green",
  "indices" : {
    "count" : 42,
    "shards" : {
      "total" : 267,
      "primaries" : 142,
      "replication" : 0.8802816901408451,
      "index" : {
        "shards" : {
          "min" : 1,
          "max" : 10,
          "avg" : 6.357142857142857
        },
        "primaries" : {
          "min" : 1,
          "max" : 5,
          "avg" : 3.380952380952381
        },
        "replication" : {
          "min" : 0.0,
          "max" : 1.0,
          "avg" : 0.5952380952380952
        }
      }
    },
    "docs" : {
      "count" : 34969506,
      "deleted" : 652092
    },
    "store" : {
      "size_in_bytes" : 32510407458
    },
    "fielddata" : {
      "memory_size_in_bytes" : 8901884,
      "evictions" : 0
    },
    "query_cache" : {
      "memory_size_in_bytes" : 693628847,
      "total_count" : 703496525,
      "hit_count" : 54393726,
      "miss_count" : 649102799,
      "cache_size" : 332419,
      "cache_count" : 798061,
      "evictions" : 465642
    },
    "completion" : {
      "size_in_bytes" : 884252429
    },
    "segments" : {
      "count" : 1903,
      "memory_in_bytes" : 1087464425,
      "terms_memory_in_bytes" : 1051123719,
      "stored_fields_memory_in_bytes" : 11981304,
      "term_vectors_memory_in_bytes" : 0,
      "norms_memory_in_bytes" : 12152448,
      "points_memory_in_bytes" : 2561246,
      "doc_values_memory_in_bytes" : 9645708,
      "index_writer_memory_in_bytes" : 23989264,
      "version_map_memory_in_bytes" : 3404,
      "fixed_bit_set_memory_in_bytes" : 0,
      "max_unsafe_auto_id_timestamp" : 1643681532685,
      "file_sizes" : { }
    }
  },
  "nodes" : {
    "count" : {
      "total" : 3,
      "data" : 3,
      "coordinating_only" : 0,
      "master" : 3,
      "ingest" : 3
    },
    "versions" : [
      "6.8.21"
    ],
    "os" : {
      "available_processors" : 24,
      "allocated_processors" : 24,
      "names" : [
        {
          "name" : "Linux",
          "count" : 3
        }
      ],
      "pretty_names" : [
        {
          "pretty_name" : "Ubuntu 20.04.2 LTS",
          "count" : 3
        }
      ],
      "mem" : {
        "total_in_bytes" : 101031100416,
        "free_in_bytes" : 9266618368,
        "used_in_bytes" : 91764482048,
        "free_percent" : 9,
        "used_percent" : 91
      }
    },
    "process" : {
      "cpu" : {
        "percent" : 21
      },
      "open_file_descriptors" : {
        "min" : 731,
        "max" : 1020,
        "avg" : 917
      }
    },
    "jvm" : {
      "max_uptime_in_millis" : 15101143605,
      "versions" : [
        {
          "version" : "1.8.0_312",
          "vm_name" : "OpenJDK 64-Bit Server VM",
          "vm_version" : "25.312-b07",
          "vm_vendor" : "Private Build",
          "count" : 3
        }
      ],
      "mem" : {
        "heap_used_in_bytes" : 22552247912,
        "heap_max_in_bytes" : 48109191168
      },
      "threads" : 361
    },
    "fs" : {
      "total_in_bytes" : 404182020096,
      "free_in_bytes" : 304499580928,
      "available_in_bytes" : 283836080128
    },
    "plugins" : [ ],
    "network_types" : {
      "transport_types" : {
        "security4" : 3
      },
      "http_types" : {
        "security4" : 3
      }
    }
  }
}

reminder? please?

You're using 6.8 which is very old, long past EOL, and no longer supported. I don't even have a development environment that works with this version any more. I suggest you upgrade to a supported version first, and if the problems persist after the upgrade we'll be in a better position to help you out.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.