ERROR - collector [index-stats] timed out when collecting data

Регулярно стали появляться такие ошибки в логах, после чего эластик (6.8) вылетает. Этот вопрос уже несколько раз поднимался в англоязычном сегменте, но подходящих ответов я там не нашел (писали, что могут быть проблемы с железом или что используется слишком много шард). Может здесь кто-нибудь подскажет, где искать проблему?

It looks like your service was stoped for some time. Is it regular, for example every 24 hours?

Yes, I wrote a bash script that checks the status of elasticsearch service and restarts it when it is inactive. This happens regularly, but not every 24 hours, sometimes with a break of several days.

Could you please paste your config file (with </> tag)? Of course please cover your sensitive data.

cluster.name: ***
node.name: master-node
node.master: true

path:
  data:
    - /mnt/elasticsearch_1
    - /mnt/elasticsearch_2
    - /mnt/elasticsearch_3

path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true
network.bind_host: [localhost, *.*.*.*]
xpack.security.enabled: false
path.repo: ["/mnt/snapshot"]
## JVM configuration
-Xms2g
-Xmx2g

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10

## optimizations
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
14-:-XX:+ShowCodeDetailsInExceptionMessages
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j2.formatMsgNoLookups=true
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=8
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

# temporary workaround for C2 bug with JDK 10 on hardware with AVX-512
10-:-XX:UseAVX=2

All look good.
And when it happens does your log say anything? Maybe system journal?

There are no entries in the system journal during this period...

Elasticsearch logs:

[2022-08-17T00:00:01,323][INFO ][o.e.c.m.MetaDataIndexTemplateService] [master-node] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2022-08-17T01:00:00,000][INFO ][o.e.x.m.e.l.LocalExporter] [master-node] cleaning up [2] old indices
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-es-6-2022.08.09/ab5DMOIlTUizH59sHLNsqA] deleting index
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-kibana-6-2022.08.09/dEZ1Uxk7T3KfBvUZBLhAYg] deleting index
[2022-08-17T02:09:29,350][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:10:06,837][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:36,838][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:56,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:06,959][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:17,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:11:36,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:46,945][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:57,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:12:16,840][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:26,974][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:12:46,841][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:56,824][WARN ][o.e.c.InternalClusterInfoService] [master-node] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2022-08-17T02:12:56,998][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:16,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:26,961][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:46,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:56,967][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:07,108][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:14:26,843][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:14:40,068][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:50,568][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:15:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] triggering scheduled [ML] maintenance tasks
[2022-08-17T02:15:00,326][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Deleting expired data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Completed deletion of expired ML data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] Successfully completed [ML] maintenance tasks
[2022-08-17T02:15:06,844][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:15:20,408][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:15:31,112][INFO ][o.e.e.NodeEnvironment    ] [master-node] using [3] data paths, mounts [[/ (/dev/vda2)]], net usable_space [82gb], net total_space [177.1gb], types [ext4]
[2022-08-17T02:15:31,116][INFO ][o.e.e.NodeEnvironment    ] [master-node] heap size [2gb], compressed ordinary object pointers [true]
[2022-08-17T02:15:31,648][INFO ][o.e.n.Node               ] [master-node] node name [master-node], node ID [WbYUANt3SeOJaH9JNP3Gsw]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node               ] [master-node] version[6.8.23], pid[2932703], build[default/deb/4f67856/2022-01-06T21:30:50.087716Z], OS[Linux/5.4.0-105-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/17.0.2/17.0.2+8-86]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node               ] [master-node] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.io.tmpdir=/tmp/elasticsearch-14765442649856788009, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [aggs-matrix-stats]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [analysis-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-geoip]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-user-agent]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-expression]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-mustache]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-painless]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [mapper-extras]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [parent-join]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [percolator]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [rank-eval]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [reindex]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [repository-url]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [transport-netty4]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [tribe]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ccr]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-core]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-deprecation]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-graph]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ilm]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-logstash]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ml]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-monitoring]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-rollup]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-security]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-sql]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-upgrade]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-watcher]
[2022-08-17T02:15:34,451][INFO ][o.e.p.PluginsService     ] [master-node] no plugins loaded
[2022-08-17T02:15:38,531][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [master-node] [controller/2933091] [Main.cc@114] controller (64 bit): Version 6.8.23 (Build 31256deab94add) Copyright (c) 2022 Elasticsearch BV
[2022-08-17T02:15:39,337][INFO ][o.e.d.DiscoveryModule    ] [master-node] using discovery type [zen] and host providers [settings]
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node               ] [master-node] initialized
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node               ] [master-node] starting ...

I've found this: link_1 and this: link_2

Have you checked your specifications and compared it to ES needs?

Thank you. Although I have already looked at these posts, it gave me the idea of exceeding the limits on disk read / write operations.


The max read/write bandwidth is 90 MBps.

It looks like you have found the issue. Please let me know if changing limits made your system work properly.

Now I'm pretty sure that in my case the problem was disk overload during heavy indexing with 'upserts'. I've tweaked the index update logic a bit and everything seems to be working fine now. At the moment, Elasticsearch's uptime is around 7 days.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.