Hi
Can You check together with me what could be the root cause
What I have already check from this log. I didn't observe also any CPU throttling on that node it was used 3 core from 13 available. Node consists 26 heap of mem.
L og entries further highlight consistent issues related to high garbage collection (GC) overhead and performance bottlenecks in the node elasticsearch-data-node-1. The Network settings on cluster are default.
- High GC Pauses:
- Multiple warnings from
JvmGcMonitorService
show significant GC pauses, such as durations of2.1s
,1.5s
, and766ms
, which point to potential memory pressure. For instance, logs like"[gc][young][250230][26706] duration [1.5s], collections [1]/[2.2s]"
indicate that young generation GC is taking longer than optimal. - The message
"[gc][250299] overhead, spent [2.1s] collecting in the last [2.7s]"
is especially concerning as it indicates that GC is consuming a significant amount of the node's available processing time. - Thread Pool Timer Delay:
- The warning
timer thread slept for [22.7s]
is an indicator that there are potential scheduling delays or heavy load impacting the responsiveness of the thread pool. - Indexing and Search Performance:
- There are log entries mentioning slow bulk requests, such as
handling request [POST][/_bulk?timeout=1m]... took [6203ms]
, which suggests that indexing operations are slower than expected. - Transport messages taking
>5000ms
consistently show delays in communication between nodes, potentially due to network or resource constraints.
{"type": "server", "timestamp": "2024-11-19T11:16:29,788Z", "level": "WARN", "component": "o.e.t.ThreadPool", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "timer thread slept for [22.7s/22746581306ns] on relative clock which is above the warn threshold of [5000ms]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:05,914Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][young][250230][26706] duration [1.5s], collections [1]/[2.2s], total [1.5s]/[12.2m], memory [17gb]->[5.9gb]/[26gb], all_pools {[young] [11.2gb]->[176mb]/[0b]}{[old] [5.7gb]->[4.7gb]/[26gb]}{[survivor] [80mb]->[1gb]/[0b]}", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:05,915Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250230] overhead, spent [1.5s] collecting in the last [2.2s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:06,916Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250231] overhead, spent [766ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:09,664Z", "level": "WARN", "component": "o.e.h.AbstractHttpServerTransport", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "handling request [null][POST][/_bulk?timeout=1m][Netty4HttpChannel{localAddress=/10.124.106.215:9200, remoteAddress=/10.124.100.32:33500}] took [6203ms] which is above the warn threshold of [5000ms]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:09,669Z", "level": "WARN", "component": "o.e.t.OutboundHandler", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "sending transport message [Response{19534995}{false}{false}{false}{class org.elasticsearch.search.query.QuerySearchResult}] of size [6637] on [Netty4TcpChannel{localAddress=/10.124.106.215:9300, remoteAddress=/10.124.106.80:46454, profile=default}] took [5603ms] which is above the warn threshold of [5000ms] with success [true]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:09,670Z", "level": "WARN", "component": "o.e.t.OutboundHandler", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "sending transport message [Request{indices:data/read/search[phase/query]}{5240669}{false}{false}{false}] of size [1047] on [Netty4TcpChannel{localAddress=/10.124.106.215:33078, remoteAddress=10.124.106.49/10.124.106.49:9300, profile=default}] took [5403ms] which is above the warn threshold of [5000ms] with success [true]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:09,670Z", "level": "WARN", "component": "o.e.t.OutboundHandler", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "sending transport message [Request{indices:data/read/search[phase/query]}{5240677}{false}{false}{false}] of size [1081] on [Netty4TcpChannel{localAddress=/10.124.106.215:51278, remoteAddress=10.124.105.75/10.124.105.75:9300, profile=default}] took [5403ms] which is above the warn threshold of [5000ms] with success [true]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:09,918Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250234] overhead, spent [276ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:14,921Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250239] overhead, spent [385ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:41,086Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250265] overhead, spent [538ms] collecting in the last [1.1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:49,253Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250273] overhead, spent [478ms] collecting in the last [1.1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:04:59,258Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250283] overhead, spent [552ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:01,260Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250285] overhead, spent [255ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:03,304Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250287] overhead, spent [393ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:05,309Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250289] overhead, spent [348ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:09,481Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250293] overhead, spent [487ms] collecting in the last [1s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:13,032Z", "level": "INFO", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250296] overhead, spent [587ms] collecting in the last [1.5s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:17,840Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][young][250299][26763] duration [2.1s], collections [1]/[2.7s], total [2.1s]/[12.4m], memory [6.7gb]->[5.8gb]/[26gb], all_pools {[young] [1gb]->[0b]/[0b]}{[old] [5.5gb]->[5.6gb]/[26gb]}{[survivor] [160mb]->[131.9mb]/[0b]}", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:05:17,902Z", "level": "WARN", "component": "o.e.m.j.JvmGcMonitorService", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": "[gc][250299] overhead, spent [2.1s] collecting in the last [2.7s]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" }
{"type": "server", "timestamp": "2024-11-19T19:30:25,296Z", "level": "INFO", "component": "o.e.c.s.IndexScopedSettings", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": " [products-20241116232459] updating [index.search.slowlog.threshold.query.warn] from [-1] to [500ms]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" , "trace.id": "56681b6c2c31c2307b5b25de4cc6cbc9" }
{"type": "server", "timestamp": "2024-11-19T19:30:25,296Z", "level": "INFO", "component": "o.e.c.s.IndexScopedSettings", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": " [products-20241116232459] updating [index.search.slowlog.threshold.query.info] from [-1] to [300ms]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" , "trace.id": "56681b6c2c31c2307b5b25de4cc6cbc9" }
{"type": "server", "timestamp": "2024-11-19T19:30:25,297Z", "level": "INFO", "component": "o.e.c.s.IndexScopedSettings", "cluster.name": "elasticsearch", "node.name": "elasticsearch-data-node-1", "message": " [products-20241116232459] updating [index.search.slowlog.threshold.query.debug] from [-1] to [200ms]", "cluster.uuid": "RYQAZzlGSDex0UWgUmDNWQ", "node.id": "wF0TkmBIRIa3xczTpWQFjw" , "trace.id": "56681b6c2c31c2307b5b25de4cc6cbc9" }