Добрый день! У нас возникла проблема с периодическим зависанием elastic. Такие зависания происходят в момент активного поиска + обновления индекса. В момент зависания elastic практически не отвечает на запросы, даже hot_threads отвечает более минуты. Вставка идет методом upsert как дочерних. У этого типа документа стоит флаг "eager_global_ordinals": true.
Возможно это как-то связанно с тем, что на диске осталось мало места?
Последние сообщения из лога:
[2017-11-16T08:45:59,321][INFO ][o.e.c.r.a.DiskThresholdMonitor] [WIN-K5I8RSP2SOQ] rerouting shards: [high disk watermark exceeded on one or more nodes]
[2017-11-16T08:45:59,323][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][old][54556][31] duration [1.1m], collections [1]/[1.1m], total [1.1m]/[30m], memory [26.3gb]->[23gb]/[29.3gb], all_pools {[young] [371kb]->[3.6mb]/[998.5mb]}{[survivor] [124.7mb]->[0b]/[124.7mb]}{[old] [26.2gb]->[23gb]/[28.2gb]}
[2017-11-16T08:45:59,323][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54556] overhead, spent [1.1m] collecting in the last [1.1m]
[2017-11-16T08:46:00,323][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54557] overhead, spent [590ms] collecting in the last [1s]
[2017-11-16T08:46:01,363][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54558] overhead, spent [610ms] collecting in the last [1s]
[2017-11-16T08:46:02,383][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54559] overhead, spent [644ms] collecting in the last [1s]
[2017-11-16T08:47:04,343][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][old][54560][32] duration [1m], collections [1]/[1m], total [1m]/[31m], memory [27.2gb]->[21gb]/[29.3gb], all_pools {[young] [5.2mb]->[6.4mb]/[998.5mb]}{[survivor] [124.7mb]->[0b]/[124.7mb]}{[old] [27.1gb]->[21gb]/[28.2gb]}
[2017-11-16T08:47:04,344][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54560] overhead, spent [1m] collecting in the last [1m]
[2017-11-16T08:47:05,358][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54561] overhead, spent [511ms] collecting in the last [1s]
[2017-11-16T08:47:07,129][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54562] overhead, spent [1s] collecting in the last [1.7s]
[2017-11-16T08:47:08,410][WARN ][o.e.m.j.JvmGcMonitorService] [WIN-K5I8RSP2SOQ] [gc][54563] overhead, spent [812ms] collecting in the last [1.2s]
Вот что показывает hot_threads в моменты зависания:
Не получается вставить целиком.
Hot threads at 2017-11-16T12:28:49.915Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
37.5% (187.5ms out of 500ms) cpu usage by thread 'elasticsearch[WIN-K5I8RSP2SOQ][search][T#4]'
3/10 snapshots sharing following 38 elements
org.apache.lucene.store.ByteBufferIndexInput.clone(ByteBufferIndexInput.java:249)
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl.clone(ByteBufferIndexInput.java:347)
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader$BlockPostingsEnum.(Lucene50PostingsReader.java:536)
org.apache.lucene.codecs.lucene50.Lucene50PostingsReader.postings(Lucene50PostingsReader.java:220)
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.postings(SegmentTermsEnum.java:1002)
org.apache.lucene.search.spans.SpanTermQuery$SpanTermWeight.getSpans(SpanTermQuery.java:119)
org.apache.lucene.search.spans.SpanOrQuery$SpanOrWeight.getSpans(SpanOrQuery.java:155)
org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:213)
org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:133)
org.apache.lucene.search.spans.SpanWeight.scorer(SpanWeight.java:38)
org.apache.lucene.search.Weight.scorerSupplier(Weight.java:126)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:400)
org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:381)
org.apache.lucene.search.Weight.bulkScorer(Weight.java:160)
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:375)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:665)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:472)
org.apache.lucene.search.join.JoinUtil.createJoinQuery(JoinUtil.java:533)
org.elasticsearch.join.query.HasChildQueryBuilder$LateParsingQuery.rewrite(HasChildQueryBuilder.java:437)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:265)
org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:265)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:683)
org.elasticsearch.search.internal.ContextIndexSearcher.rewrite(ContextIndexSearcher.java:106)
org.elasticsearch.search.DefaultSearchContext.preProcess(DefaultSearchContext.java:251)
org.elasticsearch.search.query.QueryPhase.preProcess(QueryPhase.java:95)
org.elasticsearch.search.SearchService.createContext(SearchService.java:497)
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:461)
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:340)
org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:337)
org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:69)
org.elasticsearch.transport.TransportService$7.doRun(TransportService.java:644)
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638)
org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)
Есть какие-нибудь варианты, что можно подкрутить или исправить?