Регулярно стали появляться такие ошибки в логах, после чего эластик (6.8) вылетает. Этот вопрос уже несколько раз поднимался в англоязычном сегменте, но подходящих ответов я там не нашел (писали, что могут быть проблемы с железом или что используется слишком много шард). Может здесь кто-нибудь подскажет, где искать проблему?
It looks like your service was stoped for some time. Is it regular, for example every 24 hours?
Yes, I wrote a bash script that checks the status of elasticsearch service and restarts it when it is inactive. This happens regularly, but not every 24 hours, sometimes with a break of several days.
Could you please paste your config file (with </> tag)? Of course please cover your sensitive data.
cluster.name: ***
node.name: master-node
node.master: true
path:
data:
- /mnt/elasticsearch_1
- /mnt/elasticsearch_2
- /mnt/elasticsearch_3
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.bind_host: [localhost, *.*.*.*]
xpack.security.enabled: false
path.repo: ["/mnt/snapshot"]
## JVM configuration
-Xms2g
-Xmx2g
## GC configuration
8-13:-XX:+UseConcMarkSweepGC
8-13:-XX:CMSInitiatingOccupancyFraction=75
8-13:-XX:+UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30
-Des.networkaddress.cache.ttl=60
-Des.networkaddress.cache.negative.ttl=10
## optimizations
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
14-:-XX:+ShowCodeDetailsInExceptionMessages
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j2.formatMsgNoLookups=true
-Djava.io.tmpdir=${ES_TMPDIR}
## heap dumps
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/lib/elasticsearch
-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log
## JDK 8 GC logging
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/var/log/elasticsearch/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=8
8:-XX:GCLogFileSize=64m
# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT
# temporary workaround for C2 bug with JDK 10 on hardware with AVX-512
10-:-XX:UseAVX=2
All look good.
And when it happens does your log say anything? Maybe system journal?
There are no entries in the system journal during this period...
Elasticsearch logs:
[2022-08-17T00:00:01,323][INFO ][o.e.c.m.MetaDataIndexTemplateService] [master-node] adding template [kibana_index_template:.kibana] for index patterns [.kibana]
[2022-08-17T01:00:00,000][INFO ][o.e.x.m.e.l.LocalExporter] [master-node] cleaning up [2] old indices
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-es-6-2022.08.09/ab5DMOIlTUizH59sHLNsqA] deleting index
[2022-08-17T01:00:00,006][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master-node] [.monitoring-kibana-6-2022.08.09/dEZ1Uxk7T3KfBvUZBLhAYg] deleting index
[2022-08-17T02:09:29,350][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:10:06,837][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:36,838][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:10:56,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:06,959][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:17,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:11:36,839][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:11:46,945][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:11:57,069][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:12:16,840][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:26,974][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:12:46,841][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:12:56,824][WARN ][o.e.c.InternalClusterInfoService] [master-node] Failed to update shard information for ClusterInfoUpdateJob within 15s timeout
[2022-08-17T02:12:56,998][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:16,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:26,961][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:13:46,842][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:13:56,967][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:07,108][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:14:26,843][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:14:40,068][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:14:50,568][ERROR][o.e.x.m.c.c.ClusterStatsCollector] [master-node] collector [cluster_stats] timed out when collecting data
[2022-08-17T02:15:00,000][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] triggering scheduled [ML] maintenance tasks
[2022-08-17T02:15:00,326][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Deleting expired data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [master-node] Completed deletion of expired ML data
[2022-08-17T02:15:00,331][INFO ][o.e.x.m.MlDailyMaintenanceService] [master-node] Successfully completed [ML] maintenance tasks
[2022-08-17T02:15:06,844][ERROR][o.e.x.m.c.n.NodeStatsCollector] [master-node] collector [node_stats] timed out when collecting data
[2022-08-17T02:15:20,408][ERROR][o.e.x.m.c.i.IndexStatsCollector] [master-node] collector [index-stats] timed out when collecting data
[2022-08-17T02:15:31,112][INFO ][o.e.e.NodeEnvironment ] [master-node] using [3] data paths, mounts [[/ (/dev/vda2)]], net usable_space [82gb], net total_space [177.1gb], types [ext4]
[2022-08-17T02:15:31,116][INFO ][o.e.e.NodeEnvironment ] [master-node] heap size [2gb], compressed ordinary object pointers [true]
[2022-08-17T02:15:31,648][INFO ][o.e.n.Node ] [master-node] node name [master-node], node ID [WbYUANt3SeOJaH9JNP3Gsw]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node ] [master-node] version[6.8.23], pid[2932703], build[default/deb/4f67856/2022-01-06T21:30:50.087716Z], OS[Linux/5.4.0-105-generic/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/17.0.2/17.0.2+8-86]
[2022-08-17T02:15:31,649][INFO ][o.e.n.Node ] [master-node] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j2.formatMsgNoLookups=true, -Djava.io.tmpdir=/tmp/elasticsearch-14765442649856788009, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Djava.locale.providers=COMPAT, -XX:UseAVX=2, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=deb]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService ] [master-node] loaded module [aggs-matrix-stats]
[2022-08-17T02:15:34,448][INFO ][o.e.p.PluginsService ] [master-node] loaded module [analysis-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [ingest-common]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [ingest-geoip]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [ingest-user-agent]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [lang-expression]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [lang-mustache]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [lang-painless]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [mapper-extras]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [parent-join]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [percolator]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [rank-eval]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [reindex]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [repository-url]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [transport-netty4]
[2022-08-17T02:15:34,449][INFO ][o.e.p.PluginsService ] [master-node] loaded module [tribe]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-ccr]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-core]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-deprecation]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-graph]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-ilm]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-logstash]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-ml]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-monitoring]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-rollup]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-security]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-sql]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-upgrade]
[2022-08-17T02:15:34,450][INFO ][o.e.p.PluginsService ] [master-node] loaded module [x-pack-watcher]
[2022-08-17T02:15:34,451][INFO ][o.e.p.PluginsService ] [master-node] no plugins loaded
[2022-08-17T02:15:38,531][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [master-node] [controller/2933091] [Main.cc@114] controller (64 bit): Version 6.8.23 (Build 31256deab94add) Copyright (c) 2022 Elasticsearch BV
[2022-08-17T02:15:39,337][INFO ][o.e.d.DiscoveryModule ] [master-node] using discovery type [zen] and host providers [settings]
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node ] [master-node] initialized
[2022-08-17T02:15:39,945][INFO ][o.e.n.Node ] [master-node] starting ...
It looks like you have found the issue. Please let me know if changing limits made your system work properly.



