Поиск решения CircuitBreakingException

Здравствуйте! Сегодня произошел сбой в кластере, после чего перестал работать мониторинг. Все началось с ошибки, которую я указал ниже.

[2019-12-12T06:39:29,447][WARN ][o.e.a.s.TransportClearScrollAction] [es-2] Clear SC failed on node[{es-1}{e28a_yOFTpyy0EhMEQVSCw}{OIQ7BPCbQ3OhNGDqcsSaxw}{192.168.0.20}{192.168.0.20:9300}{dilm}{ml.machine_memory=68614905856, ml.max_open_jobs=20, xpack.installed=true, role=fast-node}]
org.elasticsearch.transport.RemoteTransportException: [es-1][192.168.0.20:9300][indices:data/read/search[free_context/scroll]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30488864532/28.3gb], which is larger than the limit of [30263761305/28.1gb], real usage: [30488863912/28.3gb], new bytes reserved: [620/620b], usages [request=0/0b, fielddata=24645048/23.5mb, in_flight_requests=343248/335.2kb, accounting=120988949/115.3mb]
	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:170) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:118) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:663) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.4.2.jar:7.4.2]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:302) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1224) [netty-handler-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1271) [netty-handler-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:505) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283) [netty-codec-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1421) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:697) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:597) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:551) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:511) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) [netty-common-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.38.Final.jar:4.1.38.Final]
	at java.lang.Thread.run(Thread.java:830) [?:?]

После этого упал мониторинг с такими ошибками и не поднялся до рестарта Кибаны:

[2019-12-12T06:41:12,558][WARN ][o.e.x.m.e.l.LocalExporter] [es-2] unexpected error while indexing monitoring document
org.elasticsearch.xpack.monitoring.exporter.ExportException: RemoteTransportException[[es-3][192.168.0.22:9300][indices:data/write/bulk[s]]]; nested: RemoteTransportException[[es-3][192.168.0.22:9300][indices:data/write/bulk[s][p]]]; nested: ShardNotFoundException[no such shard];
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$throwExportException$2(LocalBulk.java:125) ~[x-pack-monitoring-7.4.2.jar:7.4.2]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?]
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) ~[?:?]
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) ~[?:?]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.throwExportException(LocalBulk.java:126) [x-pack-monitoring-7.4.2.jar:7.4.2]
	at org.elasticsearch.xpack.monitoring.exporter.local.LocalBulk.lambda$doFlush$0(LocalBulk.java:108) [x-pack-monitoring-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:70) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.TransportAction$1.onResponse(TransportAction.java:64) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:43) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.ActionListener.lambda$map$2(ActionListener.java:145) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:62) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.finishHim(TransportBulkAction.java:468) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.bulk.TransportBulkAction$BulkOperation$1.onFailure(TransportBulkAction.java:463) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.TransportAction$1.onFailure(TransportAction.java:79) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.ContextPreservingActionListener.onFailure(ContextPreservingActionListener.java:50) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.finishAsFailed(TransportReplicationAction.java:816) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$1.handleException(TransportReplicationAction.java:774) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1120) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:243) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.EsExecutors$DirectExecutorService.execute(EsExecutors.java:225) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:241) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:233) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:136) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:663) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.4.2.jar:7.4.2]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.38.Final.jar:4.1.38.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:328) [netty-codec-4.1.38.Final.jar:4.1.38.Final]

Версия эластика 7.4.2.
Можете подсказать в какую сторону мне искать? Такие исключения это тяжелый поисковый запрос? Или это огромный запрос на индексацию? На всякий случай приложил node stats и логи.

Может быть и то и другое. Слишком много памяти было использовано транспортным уровнем для хранения промежуточных запросов/ответов.

Покопался еще немного. Похоже, что вы не одни. Какая у вас версия JVM и с какими параметрами она запущена?

Версия Java:

openjdk version "13" 2019-09-17
OpenJDK Runtime Environment AdoptOpenJDK (build 13+33)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13+33, mixed mode, sharing)

А вот конфигурация:

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1GC is only supported on JDK version 10 or later.
# To use G1GC uncomment the lines below.
# 10-:-XX:-UseConcMarkSweepGC
# 10-:-XX:-UseCMSInitiatingOccupancyOnly
# 10-:-XX:+UseG1GC
# 10-:-XX:G1ReservePercent=25
# 10-:-XX:InitiatingHeapOccupancyPercent=30

## DNS cache policy
# cache ttl in seconds for positive DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.ttl; set to -1 to cache forever
-Des.networkaddress.cache.ttl=60
# cache ttl in seconds for negative DNS lookups noting that this overrides the
# JDK security property networkaddress.cache.negative ttl; set to -1 to cache
# forever
-Des.networkaddress.cache.negative.ttl=10

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dio.netty.allocator.numDirectArenas=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps; ensure the directory exists and
# has sufficient space
-XX:HeapDumpPath=data

# specify an alternative path for JVM fatal error logs
-XX:ErrorFile=logs/hs_err_pid%p.log

## JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT
-Xmx30464m
-Xms30464m

Такая ошибка воспроизводится редко (1-2 раза в неделю), сейчас я снизил объемы bulk вставки и наблюдаю. Однако, мне не нравится, что каждый день возникает ошибка удаления индексов мониторинга.

[2019-12-17T01:29:59,699][INFO ][o.e.x.m.MlDailyMaintenanceService] [es-1] triggering scheduled [ML] maintenance tasks
[2019-12-17T01:29:59,715][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [es-1] Deleting expired data
[2019-12-17T01:29:59,715][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [es-1] Completed deletion of expired ML data
[2019-12-17T01:29:59,715][INFO ][o.e.x.m.MlDailyMaintenanceService] [es-1] Successfully completed [ML] maintenance tasks
[2019-12-17T03:00:00,800][INFO ][o.e.c.m.MetaDataCreateIndexService] [es-1] [.monitoring-es-7-2019.12.17] creating index, cause [auto(bulk api)], templates [.monitoring-es, fast-node-template], shards [1]/[0], mappings [_doc]
[2019-12-17T03:00:00,800][INFO ][o.e.c.r.a.AllocationService] [es-1] updating number_of_replicas to [1] for indices [.monitoring-es-7-2019.12.17]
[2019-12-17T03:00:01,460][INFO ][o.e.c.r.a.AllocationService] [es-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-es-7-2019.12.17][0]]]).
[2019-12-17T03:00:09,265][INFO ][o.e.c.m.MetaDataCreateIndexService] [es-1] [.monitoring-kibana-7-2019.12.17] creating index, cause [auto(bulk api)], templates [fast-node-template, .monitoring-kibana], shards [1]/[0], mappings [_doc]
[2019-12-17T03:00:09,265][INFO ][o.e.c.r.a.AllocationService] [es-1] updating number_of_replicas to [1] for indices [.monitoring-kibana-7-2019.12.17]
[2019-12-17T03:00:09,827][INFO ][o.e.c.r.a.AllocationService] [es-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-kibana-7-2019.12.17][0]]]).
[2019-12-17T03:59:59,688][INFO ][o.e.x.m.e.l.LocalExporter] [es-1] cleaning up [2] old indices
[2019-12-17T03:59:59,688][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-1] [.monitoring-kibana-7-2019.12.10/ollJc0TaTGGBKtAuV5rygQ] deleting index
[2019-12-17T03:59:59,688][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-1] [.monitoring-es-7-2019.12.10/QX1fqVlyRLWDLl553Zso8g] deleting index
[2019-12-17T04:00:00,016][INFO ][o.e.x.m.e.l.LocalExporter] [es-1] cleaning up [2] old indices
[2019-12-17T04:00:00,957][DEBUG][o.e.a.a.i.d.TransportDeleteIndexAction] [es-1] failed to delete indices [[[.monitoring-kibana-7-2019.12.10/ollJc0TaTGGBKtAuV5rygQ], [.monitoring-es-7-2019.12.10/QX1fqVlyRLWDLl553Zso8g]]]
org.elasticsearch.index.IndexNotFoundException: no such index [.monitoring-kibana-7-2019.12.10]
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:670) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.lambda$deleteIndices$0(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.deleteIndices(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$1.execute(MetaDataDeleteIndexService.java:84) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.4.2.jar:7.4.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
[2019-12-17T04:00:00,957][ERROR][o.e.x.m.e.l.LocalExporter] [es-1] failed to delete indices
org.elasticsearch.index.IndexNotFoundException: no such index [.monitoring-kibana-7-2019.12.10]
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:670) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.lambda$deleteIndices$0(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.deleteIndices(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$1.execute(MetaDataDeleteIndexService.java:84) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.4.2.jar:7.4.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]

Ошибка в circuit breaker должна быть пофиксена
в https://github.com/elastic/elasticsearch/pull/49478 и https://github.com/elastic/elasticsearch/pull/50100

А этот индекс вообще существовал? У вас мониторинг для кибаны включен?

Спасибо за информацию.

По поводу мониторинга, да, он включен и да, такой индекс существовал. У меня ругается не на один и тот же индекс, а каждый день на индекс недельной давности.
Такое ощущение, что попытка удаления старых данных мониторинга выполняется дважды, первый раз успешно, а во второй раз уже не удается найти такой индекс.

Сегодня были ошибки и для обеих мониторингов:

[2019-12-18T01:29:59,702][INFO ][o.e.x.m.MlDailyMaintenanceService] [es-1] triggering scheduled [ML] maintenance tasks
[2019-12-18T01:29:59,718][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [es-1] Deleting expired data
[2019-12-18T01:29:59,718][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [es-1] Completed deletion of expired ML data
[2019-12-18T01:29:59,718][INFO ][o.e.x.m.MlDailyMaintenanceService] [es-1] Successfully completed [ML] maintenance tasks
[2019-12-18T03:00:01,269][INFO ][o.e.c.m.MetaDataCreateIndexService] [es-1] [.monitoring-kibana-7-2019.12.18] creating index, cause [auto(bulk api)], templates [fast-node-template, .monitoring-kibana], shards [1]/[0], mappings [_doc]
[2019-12-18T03:00:01,269][INFO ][o.e.c.r.a.AllocationService] [es-1] updating number_of_replicas to [1] for indices [.monitoring-kibana-7-2019.12.18]
[2019-12-18T03:00:01,691][INFO ][o.e.c.m.MetaDataCreateIndexService] [es-1] [.monitoring-es-7-2019.12.18] creating index, cause [auto(bulk api)], templates [.monitoring-es, fast-node-template], shards [1]/[0], mappings [_doc]
[2019-12-18T03:00:01,691][INFO ][o.e.c.r.a.AllocationService] [es-1] updating number_of_replicas to [1] for indices [.monitoring-es-7-2019.12.18]
[2019-12-18T03:00:02,691][INFO ][o.e.c.r.a.AllocationService] [es-1] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-kibana-7-2019.12.18][0]]]).
[2019-12-18T03:59:59,697][INFO ][o.e.x.m.e.l.LocalExporter] [es-1] cleaning up [2] old indices
[2019-12-18T03:59:59,697][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-1] [.monitoring-es-7-2019.12.11/tplxlnwQR7KCvLjqguBP5A] deleting index
[2019-12-18T03:59:59,697][INFO ][o.e.c.m.MetaDataDeleteIndexService] [es-1] [.monitoring-kibana-7-2019.12.11/75ocyFQiTmuI_jFGJJid6w] deleting index
[2019-12-18T04:00:00,025][INFO ][o.e.x.m.e.l.LocalExporter] [es-1] cleaning up [2] old indices
[2019-12-18T04:00:00,087][DEBUG][o.e.a.a.i.d.TransportDeleteIndexAction] [es-1] failed to delete indices [[[.monitoring-es-7-2019.12.11/tplxlnwQR7KCvLjqguBP5A], [.monitoring-kibana-7-2019.12.11/75ocyFQiTmuI_jFGJJid6w]]]
org.elasticsearch.index.IndexNotFoundException: no such index [.monitoring-es-7-2019.12.11]
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:670) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.lambda$deleteIndices$0(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.deleteIndices(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$1.execute(MetaDataDeleteIndexService.java:84) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.4.2.jar:7.4.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]
[2019-12-18T04:00:00,087][ERROR][o.e.x.m.e.l.LocalExporter] [es-1] failed to delete indices
org.elasticsearch.index.IndexNotFoundException: no such index [.monitoring-es-7-2019.12.11]
	at org.elasticsearch.cluster.metadata.MetaData.getIndexSafe(MetaData.java:670) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.lambda$deleteIndices$0(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195) ~[?:?]
	at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1605) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
	at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578) ~[?:?]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService.deleteIndices(MetaDataDeleteIndexService.java:94) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.metadata.MetaDataDeleteIndexService$1.execute(MetaDataDeleteIndexService.java:84) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324) ~[elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:703) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-7.4.2.jar:7.4.2]
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-7.4.2.jar:7.4.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:830) [?:?]

У меня шаблон системных индексов дополняется шаблоном, который указывает, что эти индексы можно размещать только на оборудовании с SSD дисками. Вот такой:

{
  "fast-node-template" : {
    "order" : 0,
    "index_patterns" : [
      ".*"
    ],
    "settings" : {
      "index" : {
        "routing" : {
          "allocation" : {
            "include" : {
              "role" : "fast-node"
            }
          }
        }
      }
    },
    "mappings" : { },
    "aliases" : { }
  }
}

Может быть это как-нибудь мешает работе?

А как у вас xpack.monitoring.exporters сконфигурированы?

Я не изменял значения xpack.monitoring.exporters, сейчас он отсутствует в elasticsearch.yml.

А в cluster settings что-нибудь на тему мониторинга присутствует?

Там только параметр включения мониторинга

{
  "persistent" : {
    "indices" : {
      "recovery" : {
        "max_bytes_per_sec" : "80mb"
      }
    },
    "script" : {
      "max_size_in_bytes" : "524288"
    },
    "xpack" : {
      "monitoring" : {
        "collection" : {
          "enabled" : "true"
        }
      }
    }
  },
  "transient" : { }
}

Я хочу попробовать полностью перезапустить мониторинг, для этого я должен задать monitoring.collection.enabled = false и удалить все индексы, этого будет достаточно или еще можно сбросить какие-нибудь параметры?

Я обсудил этот вопрос с разработчиком, который работает в данный момент с этой частью elasticsearch, и он сказал что лучше всего будет перезагрузить текущий мастер узел. Он уверен, что это должно проблему разрешить. Похоже, что на этом узле что-то сбойнуло и два экспортера мониторинга пытаются удалять одни и те же индексы.

Спасибо, на выходных попробую перезагрузить, по результатам отпишусь.

Игорь, спасибо, перезагрузка ноды решила данную проблему.