100% CPU utilization on one of the data node, ES version 2.3

Hello,

I am facing problem where one of my data nodes starts having 100% CPU usage while others don't(Elasticsearch version is 2.3).
Restarting elastic process solves the issue for a while.

I have referred to multiple cases here but nothing helped me.

Below are the hot threads:
100.3% (501.6ms out of 500ms) cpu usage by thread 'elasticsearch[ip-ip.ap-southeast-1.compute.internal][[testindexc][1]: Lucene Merge Thread #326]'
4/10 snapshots sharing following 14 elements
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:754)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:745)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:243)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:634)
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedNumericField(PerFieldDocValuesFormat.java:126)
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:417)
org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:236)
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)
2/10 snapshots sharing following 13 elements
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer$4$1.next(Lucene54DocValuesConsumer.java:745)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addNumericField(Lucene54DocValuesConsumer.java:243)
org.apache.lucene.codecs.lucene54.Lucene54DocValuesConsumer.addSortedNumericField(Lucene54DocValuesConsumer.java:634)
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addSortedNumericField(PerFieldDocValuesFormat.java:126)
org.apache.lucene.codecs.DocValuesConsumer.mergeSortedNumericField(DocValuesConsumer.java:417)
org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:236)
org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:150)
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105)
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4075)
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3655)
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:588)
org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:626)

Cluster details:
{
"cluster_name" : "prod-elasticsearch-cluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 17,
"number_of_data_nodes" : 11,
"active_primary_shards" : 21,
"active_shards" : 117,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Master Nodes: 3
Ingestion Nodes: 3
Data Nodes: 11

The index testindexc which showing in hot_threads have 3 shards and 9 replicas, total size is 25GB. Kindly suggest best possible numbers(shards vs replica)

Please help. Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.