Adding nodes to cluster cause all nodes Load


(Xavier Facq) #1

Hi,

Here is my strange story :
We had a cluster with 5 nodes with one node having load.
We have added 2 new nodes. After few days we have deleted 2 old ones because they had to much load, and we finally have added 2 new powerful machines. Finally there is 7 nodes in the cluster.

Since all this changes all our nodes are in load Critical (in our Centreons) It's quite inexplicable!
We were expected more power and therefore less load on all our machines but it is the inverse :frowning:

An idea to launch a refresh of some kind?

Note that :
All shards are well replicated, no more shards no more replicats, nothing change. Just 2 more powerfull nodes...

Thank !
Xavier


(Mark Walkom) #2

What's in hot threads?


(Xavier Facq) #3

Hi Mark,

There that :

  ::: {es-prod-node-prod2}{ZW_3xChXRM2yB5ZS9KTWMQ}{10.91.6.66}{10.91.6.66:9300}{master=false}
Hot threads at 2017-10-06T06:53:13.120Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

73.9% (369.3ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod2][[bypath-index-external-contacts-201709300100][16]: Lucene Merge Thread #4]'
  3/10 snapshots sharing following 24 elements
    sun.nio.ch.NativeThread.current(Native Method)
    sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
    sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:727)
    sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:716)
    org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
    org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
    ...
    org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3666)
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
    org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)                                                                                                                                             
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)                                                                                                                                                                     
  3/10 snapshots sharing following 16 elements                                                                                                                                                                                                                              
    org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$2.get(Lucene54DocValuesProducer.java:502)                                                                                                                                                                   
    org.apache.lucene.codecs.lucene54.Lucene54DocValuesProducer$8.valueAt(Lucene54DocValuesProducer.java:869)                                                                                                                                                               
    org.apache.lucene.codecs.DocValuesConsumer$4$1.setNext(DocValuesConsumer.java:522)                                                                                                                                                                                      
    ...
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4086)                                                                                                                                                                                                  
    org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3666)                                                                                                                                                                                                        
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)                                                                                                                                                                             
    org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)                                                                                                                                             
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)                                                                                                                                                                     
  2/10 snapshots sharing following 15 elements                                                                                                                                                                                                                              
    org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)                                                                                                                                                                                                       
    org.apache.lucene.util.packed.DirectWriter.flush(DirectWriter.java:86)                                                                                                                                                                                                  
    org.apache.lucene.util.packed.DirectWriter.add(DirectWriter.java:78)                                                                  
    ...
    org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
    org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4086)
    org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3666)
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
    org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
  2/10 snapshots sharing following 13 elements
    org.apache.lucene.codecs.DocValuesConsumer$4$1.hasNext(DocValuesConsumer.java:497)
    org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:243)
    ...
    org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3666)
    org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
    org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
    org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

  9.6% (48.1ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod2][search][T#8]'
  10/10 snapshots sharing following 2 elements
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)

  8.2% (40.9ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod2][search][T#7]'
  10/10 snapshots sharing following 2 elements
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)

(Xavier Facq) #4

And

  ::: {es-prod-node-prod3}{ImMn6t02TJyDYggVTvni0g}{10.91.156.2}{10.91.156.2:9300}{master=false}
Hot threads at 2017-10-06T06:53:13.254Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

49.2% (246.1ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod3][search][T#9]'
  4/10 snapshots sharing following 23 elements
    org.apache.lucene.index.TermContext.build(TermContext.java:99)
    org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:192)
    ...
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)
  5/10 snapshots sharing following 10 elements
    sun.misc.Unsafe.park(Native Method)
    java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    java.util.concurrent.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:735)
    ...
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)
  unique snapshot
    sun.nio.ch.NativeThread.current(Native Method)
    sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
    sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:727)
    ...
    org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
    org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)

17.8% (89ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod3][bulk][T#2]'
  2/10 snapshots sharing following 22 elements
    org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:365)
    org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321)
    org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:273)
   ...
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)
  2/10 snapshots sharing following 16 elements
    org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:454)
    org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:605)
    org.elasticsearch.index.engine.Engine$Index.execute(Engine.java:836)
    ...
    org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:293)
    org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)
  6/10 snapshots sharing following 11 elements
    org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:68)
    org.elasticsearch.action.support.replication.TransportReplicationAction$AsyncReplicaAction.doRun(TransportReplicationAction.java:401)
    ...
    org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    java.lang.Thread.run(Thread.java:745)

(Xavier Facq) #5

Or

::: {es-prod-node-prod5}{CnqYBPyiSvKjK04iMFTPBA}{10.91.157.202}{10.91.157.202:9300}{master=false}
  Hot threads at 2017-10-06T07:04:21.034Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
  
  66.4% (331.9ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod5][search][T#5]'
2/10 snapshots sharing following 2 elements
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  java.lang.Thread.run(Thread.java:745)
  
  13.7% (68.6ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod5][search][T#10]'
2/10 snapshots sharing following 2 elements
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  java.lang.Thread.run(Thread.java:745)


33.5% (167.6ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod3][[bypath-index-external-contacts-201709300100][17]: Lucene Merge Thread #168]'
  4/10 snapshots sharing following 21 elements

::: {es-prod-node-prod8}{8Rb--vEmT0W4iGBqX53rYw}{10.91.145.3}{10.91.145.3:9300}{master=false}
  Hot threads at 2017-10-06T07:04:21.033Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
  
  81.9% (409.6ms out of 500ms) cpu usage by thread 'elasticsearch[es-prod-node-prod8][bulk][T#6]'
3/10 snapshots sharing following 30 elements
  java.util.zip.Deflater.deflateBytes(Native Method)
  java.util.zip.Deflater.deflate(Deflater.java:432)

etc...


(Mark Walkom) #6

What version are you on?


(Xavier Facq) #7

All our nodes are 30Go allocated on 64Go, 8CPU / SSD

OS:
"refresh_interval_in_millis": 1000,
"name": "Linux",
"arch": "amd64",
"version": "3.10.0-327.22.2.el7.x86_64",
"available_processors": 8,
"allocated_processors": 8

ES:
"version": "2.4.5"
"build": "c849dd1",

JVM:
"version": "1.7.0_141",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "24.141-b02",
"vm_vendor": "Oracle Corporation",


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.