Hello everyone,
I have a problem if someone can help me it would be very appreciated.
For several weeks I have noticed that my elastic cluster is unavailable for about 1 hour almost all the time at the same time on Sunday between 1am and 2am. I investigated and I came out that I must have too many shards in my cluster compared to its size so I reindexed a good part of the smaller indexes in order to solve the problem.
I now adhere to the best practices of 20 times the gigabyte heap of rubber band memory. However, I had the same problem again last Sunday and I don't understand the cause. I can attach parameters and stats of my cluster if needed.
i'm using elastick satck 8.3.2
4 node :
- 3 HOT / WARM
- 1 COLD
Cluster :
Primary Shards : 80
Replica Shards : 74
{"log":"{\"@timestamp\":\"2022-11-27T01:00:27.627Z\", \"log.level\":\"ERROR\", \"message\":\"collector [cluster_stats] timed out when collecting data: node [AMjWTf-GQYyqgoZb2q4wKg] did not respond within [10s]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1126]\",\"log.logger\":\"org.elasticsearch.xpack.monitoring.collector.cluster.ClusterStatsCollector\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:27.627731148Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:32.572Z\", \"log.level\": \"WARN\", \"message\":\"failed to retrieve stats for node [AMjWTf-GQYyqgoZb2q4wKg]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1156]\",\"log.logger\":\"org.elasticsearch.cluster.InternalClusterInfoService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\",\"error.type\":\"org.elasticsearch.transport.ReceiveTimeoutTransportException\",\"error.message\":\"[es02][10.0.30.12:9500][cluster:monitor/nodes/stats[n]] request_id [239966419] timed out after [15010ms]\",\"error.stack_trace\":\"org.elasticsearch.transport.ReceiveTimeoutTransportException: [es02][10.0.30.12:9500][cluster:monitor/nodes/stats[n]] request_id [239966419] timed out after [15010ms]\\n\"}\n","stream":"stdout","time":"2022-11-27T01:00:32.573130191Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:32.575Z\", \"log.level\": \"WARN\", \"message\":\"failed to retrieve shard stats from node [AMjWTf-GQYyqgoZb2q4wKg]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1142]\",\"log.logger\":\"org.elasticsearch.cluster.InternalClusterInfoService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\",\"error.type\":\"org.elasticsearch.transport.ReceiveTimeoutTransportException\",\"error.message\":\"[es02][10.0.30.12:9500][indices:monitor/stats[n]] request_id [239966424] timed out after [15010ms]\",\"error.stack_trace\":\"org.elasticsearch.transport.ReceiveTimeoutTransportException: [es02][10.0.30.12:9500][indices:monitor/stats[n]] request_id [239966424] timed out after [15010ms]\\n\"}\n","stream":"stdout","time":"2022-11-27T01:00:32.575670636Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:37.628Z\", \"log.level\":\"ERROR\", \"message\":\"collector [enrich_coordinator_stats] timed out when collecting data: java.util.concurrent.TimeoutException: Timeout waiting for task.\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1126]\",\"log.logger\":\"org.elasticsearch.xpack.monitoring.collector.enrich.EnrichStatsCollector\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:37.628367826Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:44.583Z\", \"log.level\": \"WARN\", \"message\":\"Received response for a request that has timed out, sent [27s/27018ms] ago, timed out [17s/17012ms] ago, action [internal:coordination/fault_detection/follower_check], node [{es02}{AMjWTf-GQYyqgoZb2q4wKg}{tYRjQaIFQi6GY_jmFXQspA}{es02}{10.0.30.12}{10.0.30.12:9500}{dhilmstw}{rack=r0, ml.machine_memory=6442450944, xpack.installed=true, ml.max_jvm_size=3221225472}], id [239966413]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][transport_worker][T#2]\",\"log.logger\":\"org.elasticsearch.transport.TransportService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:44.583571218Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:44.583Z\", \"log.level\": \"WARN\", \"message\":\"Received response for a request that has timed out, sent [16s/16011ms] ago, timed out [6s/6003ms] ago, action [internal:coordination/fault_detection/follower_check], node [{es02}{AMjWTf-GQYyqgoZb2q4wKg}{tYRjQaIFQi6GY_jmFXQspA}{es02}{10.0.30.12}{10.0.30.12:9500}{dhilmstw}{rack=r0, ml.machine_memory=6442450944, xpack.installed=true, ml.max_jvm_size=3221225472}], id [239966484]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][transport_worker][T#2]\",\"log.logger\":\"org.elasticsearch.transport.TransportService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:47.525257451Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:44.640Z\", \"log.level\": \"WARN\", \"message\":\"Received response for a request that has timed out, sent [27s/27018ms] ago, timed out [17s/17012ms] ago, action [cluster:monitor/stats[n]], node [{es02}{AMjWTf-GQYyqgoZb2q4wKg}{tYRjQaIFQi6GY_jmFXQspA}{es02}{10.0.30.12}{10.0.30.12:9500}{dhilmstw}{rack=r0, ml.machine_memory=6442450944, xpack.installed=true, ml.max_jvm_size=3221225472}], id [239966432]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][transport_worker][T#1]\",\"log.logger\":\"org.elasticsearch.transport.TransportService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:47.525282252Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:44.640Z\", \"log.level\": \"WARN\", \"message\":\"Received response for a request that has timed out, sent [27s/27018ms] ago, timed out [12s/12008ms] ago, action [cluster:monitor/nodes/stats[n]], node [{es02}{AMjWTf-GQYyqgoZb2q4wKg}{tYRjQaIFQi6GY_jmFXQspA}{es02}{10.0.30.12}{10.0.30.12:9500}{dhilmstw}{rack=r0, ml.machine_memory=6442450944, xpack.installed=true, ml.max_jvm_size=3221225472}], id [239966419]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][transport_worker][T#1]\",\"log.logger\":\"org.elasticsearch.transport.TransportService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:47.525290152Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:44.641Z\", \"log.level\": \"WARN\", \"message\":\"Received response for a request that has timed out, sent [27s/27018ms] ago, timed out [12s/12008ms] ago, action [indices:monitor/stats[n]], node [{es02}{AMjWTf-GQYyqgoZb2q4wKg}{tYRjQaIFQi6GY_jmFXQspA}{es02}{10.0.30.12}{10.0.30.12:9500}{dhilmstw}{rack=r0, ml.machine_memory=6442450944, xpack.installed=true, ml.max_jvm_size=3221225472}], id [239966424]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][transport_worker][T#2]\",\"log.logger\":\"org.elasticsearch.transport.TransportService\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:47.525296052Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:47.630Z\", \"log.level\":\"ERROR\", \"message\":\"collector [index_recovery] timed out when collecting data: node [AMjWTf-GQYyqgoZb2q4wKg] did not respond within [10s]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1126]\",\"log.logger\":\"org.elasticsearch.xpack.monitoring.collector.indices.IndexRecoveryCollector\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:47.631317845Z"}
{"log":"{\"@timestamp\":\"2022-11-27T01:00:57.632Z\", \"log.level\":\"ERROR\", \"message\":\"collector [index-stats] timed out when collecting data: node [AMjWTf-GQYyqgoZb2q4wKg] did not respond within [10s]\", \"ecs.version\": \"1.2.0\",\"service.name\":\"ES_ECS\",\"event.dataset\":\"elasticsearch.server\",\"process.thread.name\":\"elasticsearch[es03][generic][T#1126]\",\"log.logger\":\"org.elasticsearch.xpack.monitoring.collector.indices.IndexStatsCollector\",\"elasticsearch.cluster.uuid\":\"o3uSGc7TR0O503-YKA9kLQ\",\"elasticsearch.node.id\":\"ElsMpBDUTa2m8LnuH-Boyg\",\"elasticsearch.node.name\":\"es03\",\"elasticsearch.cluster.name\":\"ds-monitoring-prod\"}\n","stream":"stdout","time":"2022-11-27T01:00:57.632420158Z"}
Thanks in advance