ERROR - collector [index-stats] timed out when collecting data

AntonUstinov · August 18, 2022, 7:21am

Регулярно стали появляться такие ошибки в логах, после чего эластик (6.8) вылетает. Этот вопрос уже несколько раз поднимался в англоязычном сегменте, но подходящих ответов я там не нашел (писали, что могут быть проблемы с железом или что используется слишком много шард). Может здесь кто-нибудь подскажет, где искать проблему?

cheshirecat · August 18, 2022, 8:48am

It looks like your service was stoped for some time. Is it regular, for example every 24 hours?

AntonUstinov · August 18, 2022, 9:07am

Yes, I wrote a bash script that checks the status of elasticsearch service and restarts it when it is inactive. This happens regularly, but not every 24 hours, sometimes with a break of several days.

cheshirecat · August 18, 2022, 9:12am

Could you please paste your config file (with </> tag)? Of course please cover your sensitive data.

AntonUstinov · August 18, 2022, 9:21am

cluster.name: ***
node.name: master-node
node.master: true

path:
  data:
    - /mnt/elasticsearch_1
    - /mnt/elasticsearch_2
    - /mnt/elasticsearch_3

path.logs: /var/log/elasticsearch

bootstrap.memory_lock: true
network.bind_host: [localhost, *.*.*.*]
xpack.security.enabled: false
path.repo: ["/mnt/snapshot"]

AntonUstinov · August 18, 2022, 9:24am

## JVM configuration
-Xms2g
-Xmx2g

## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly

14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10

## optimizations
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
14-:-XX:+ShowCodeDetailsInExceptionMessages
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j2.formatMsgNoLookups=true
-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=8
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

# temporary workaround for C2 bug with JDK 10 on hardware with AVX-512
10-:-XX:UseAVX=2

AntonUstinov · August 18, 2022, 9:29am

cheshirecat · August 18, 2022, 9:38am

All look good.
And when it happens does your log say anything? Maybe system journal?

AntonUstinov · August 18, 2022, 10:01am

There are no entries in the system journal during this period...

Elasticsearch logs:

[2022-08-17T00:00:01,323][INFO ][o.e.c.m.MetaDataIndexTemplateService] [master-node] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2022-08-17T01:00:00,000][INFO ][o.e.x.m.e.l.LocalExporter] [master-node] cleaning up [2] old indices
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-es-6-2022.08.09/ab5DMOIlTUizH59sHLNsqA] deleting index
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-kibana-6-2022.08.09/dEZ1Uxk7T3KfBvUZBLhAYg] deleting index
[2022-08-17T02:09:29,350][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:10:06,837][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:36,838][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:56,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:06,959][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:17,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:11:36,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:46,945][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:57,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:12:16,840][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:26,974][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:12:46,841][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:56,824][WARN ][o.e.c.InternalClusterInfoService] [master-node] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2022-08-17T02:12:56,998][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:16,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:26,961][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:46,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:56,967][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:07,108][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:14:26,843][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:14:40,068][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:50,568][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:15:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] triggering scheduled [ML] maintenance tasks
[2022-08-17T02:15:00,326][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Deleting expired data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Completed deletion of expired ML data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] Successfully completed [ML] maintenance tasks
[2022-08-17T02:15:06,844][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:15:20,408][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:15:31,112][INFO ][o.e.e.NodeEnvironment    ] [master-node] using [3] data paths, mounts [[/ (/dev/vda2)]], net usable_space [82gb], net total_space [177.1gb], types [ext4]
[2022-08-17T02:15:31,116][INFO ][o.e.e.NodeEnvironment    ] [master-node] heap size [2gb], compressed ordinary object pointers [true]
[2022-08-17T02:15:31,648][INFO ][o.e.n.Node               ] [master-node] node name [master-node], node ID [WbYUANt3SeOJaH9JNP3Gsw]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node               ] [master-node] version[6.8.23], pid[2932703], build[default/deb/4f67856/2022-01-06T21:30:50.087716Z], OS[Linux/5.4.0-105-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/17.0.2/17.0.2+8-86]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node               ] [master-node] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.io.tmpdir=/tmp/elasticsearch-14765442649856788009, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [aggs-matrix-stats]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [analysis-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-geoip]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [ingest-user-agent]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-expression]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-mustache]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [lang-painless]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [mapper-extras]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [parent-join]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [percolator]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [rank-eval]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [reindex]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [repository-url]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [transport-netty4]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [tribe]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ccr]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-core]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-deprecation]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-graph]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ilm]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-logstash]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-ml]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-monitoring]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-rollup]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-security]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-sql]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-upgrade]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService     ] [master-node] loaded module [x-pack-watcher]
[2022-08-17T02:15:34,451][INFO ][o.e.p.PluginsService     ] [master-node] no plugins loaded
[2022-08-17T02:15:38,531][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [master-node] [controller/2933091] [Main.cc@114] controller (64 bit): Version 6.8.23 (Build 31256deab94add) Copyright (c) 2022 Elasticsearch BV
[2022-08-17T02:15:39,337][INFO ][o.e.d.DiscoveryModule    ] [master-node] using discovery type [zen] and host providers [settings]
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node               ] [master-node] initialized
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node               ] [master-node] starting ...

cheshirecat · August 18, 2022, 10:33am

I've found this: link_1 and this: link_2

Have you checked your specifications and compared it to ES needs?

AntonUstinov · August 18, 2022, 11:35am

Thank you. Although I have already looked at these posts, it gave me the idea of exceeding the limits on disk read / write operations.

The max read/write bandwidth is 90 MBps.

cheshirecat · August 18, 2022, 12:04pm

It looks like you have found the issue. Please let me know if changing limits made your system work properly.

AntonUstinov · August 28, 2022, 8:32am

Now I'm pretty sure that in my case the problem was disk overload during heavy indexing with 'upserts'. I've tweaked the index update logic a bit and everything seems to be working fine now. At the moment, Elasticsearch's uptime is around 7 days.

system · September 25, 2022, 8:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Collector [cluster_stats] timed out when collecting data: node Elasticsearch	4	781	December 27, 2022
Elastic cluster issue :collector [index-stats] timed out when collecting data Elasticsearch elastic-stack-monitoring	1	418	July 26, 2019
Es5.4.3 timed out when collecting data Elasticsearch	3	1610	November 14, 2017
Collector [node_stats] timed out when collecting data Elasticsearch	2	663	March 25, 2019
[2019-03-11T12:09:52,460][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [node-2] collector [cluster_stats] timed out when collecting data Elasticsearch elastic-stack-monitoring	2	380	April 8, 2019

ERROR - collector [index-stats] timed out when collecting data

Related topics