100% CPU utilization on one of the data node, ES version 2.3


(Nagarjuna D N) #1

Hello,

I am facing problem where one of my data nodes starts having 100% CPU usage while others don't(Elasticsearch version is 2.3).
Restarting elastic process solves the issue for a while.

I have referred to multiple cases here but nothing helped me.

Below are the hot threads:
100.3% (501.6ms out of 500ms) cpu usage by thread 'elasticsearch[ip-ip.ap-southeast-1.compute.internal][[testindexc][1]: Lucene Merge Thread #326]'
4/10 snapshots sharing following 14 elements
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:754)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:745)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:243)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:634)
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedNumericField(PerFieldDocValuesFormat.java:126)
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:417)
org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:236)
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
2/10 snapshots sharing following 13 elements
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:745)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:243)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:634)
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedNumericField(PerFieldDocValuesFormat.java:126)
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:417)
org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:236)
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

Cluster details:
{
"cluster_name" : "prod-elasticsearch-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 17,
"number_of_data_nodes" : 11,
"active_primary_shards" : 21,
"active_shards" : 117,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Master Nodes: 3
Ingestion Nodes: 3
Data Nodes: 11

The index testindexc which showing in hot_threads have 3 shards and 9 replicas, total size is 25GB. Kindly suggest best possible numbers(shards vs replica)

Please help. Thanks


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.