Need help in undertsanding Elastic Hot Threads API output

I am running a single Node Elastic Cluster on a Server . Filebeat is used as collector for netflow data . Further FIlebeat is giving output to Elastic . Some Important Point

Server has 32 core CPU and 500 GB RAM . I see Elastic CPU load at around constant 60 % . Average system load for last 5 min is around 40

My Current Indexing speed in elastic is around 45000 /second . However I have input of around 80000 /second in peak time . Therefore in peak time I huge packet drops

From reading elastic documentation I see main bottle neck is elastic CPU usage . So I tried to debug this through hot threads API . Could anyone help me in understanding what further optimisation should I do
Hot Threads output is as follows


::: {single_noe}{lfOR4iS9TTCUZ5jWrkHBog}{jR0dgpikTDOBxkEa0ZGMDw}{127.0.0.1}{127.0.0.1:9300}{cdfhilmrstw}{ml.machine_memory=540121751552, xpack.installed=true, transform.node=true, ml.max_open_jobs=512, ml.max_jvm_size=274877906944}
   Hot threads at 2022-11-18T05:11:24.749Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:
   
   100.0% [cpu=99.8%, other=0.2%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[single_noe][[filebeat-7.13.3-2022.11.17-002536][0]: Lucene Merge Thread #628]'
     2/10 snapshots sharing following 12 elements
       app//org.apache.lucene.index.OrdinalMap.build(OrdinalMap.java:168)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:801)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 12 elements
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:884)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 12 elements
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:924)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 16 elements
       app//org.apache.lucene.codecs.DocValuesConsumer$SortedSetDocValuesSub.nextDoc(DocValuesConsumer.java:742)
       app//org.apache.lucene.index.DocIDMerger$Sub.nextMappedDoc(DocIDMerger.java:56)
       app//org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:116)
       app//org.apache.lucene.codecs.DocValuesConsumer$5$1.nextDoc(DocValuesConsumer.java:848)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:923)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     2/10 snapshots sharing following 16 elements
       app//org.apache.lucene.codecs.DocValuesConsumer$SortedSetDocValuesSub.nextDoc(DocValuesConsumer.java:742)
       app//org.apache.lucene.index.DocIDMerger$Sub.nextMappedDoc(DocIDMerger.java:56)
       app//org.apache.lucene.index.DocIDMerger$SequentialDocIDMerger.next(DocIDMerger.java:116)
       app//org.apache.lucene.codecs.DocValuesConsumer$5$1.nextDoc(DocValuesConsumer.java:848)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addSortedSetField(Lucene80DocValuesConsumer.java:940)
       app//org.apache.lucene.codecs.DocValuesConsumer.mergeSortedSetField(DocValuesConsumer.java:804)
       app//org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:145)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.merge(PerFieldDocValuesFormat.java:155)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:195)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:150)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
   
   100.0% [cpu=99.7%, other=0.3%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[single_noe][[filebeat-7.13.3-2022.11.17-002536][0]: Lucene Merge Thread #631]'
     2/10 snapshots sharing following 5 elements
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
   
   100.0% [cpu=99.7%, other=0.3%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[single_noe][[filebeat-7.13.3-2022.11.17-002536][0]: Lucene Merge Thread #622]'
     4/10 snapshots sharing following 12 elements
       app//org.apache.lucene.util.PriorityQueue.downHeap(PriorityQueue.java:267)
       app//org.apache.lucene.util.PriorityQueue.updateTop(PriorityQueue.java:202)
       app//org.apache.lucene.util.bkd.BKDWriter.merge(BKDWriter.java:507)
       app//org.apache.lucene.codecs.lucene86.Lucene86PointsWriter.merge(Lucene86PointsWriter.java:237)
       app//org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:201)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:161)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)
     6/10 snapshots sharing following 9 elements
       app//org.apache.lucene.codecs.lucene86.Lucene86PointsWriter.merge(Lucene86PointsWriter.java:237)
       app//org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:201)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:161)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4757)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4361)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5920)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626)
       app//org.elasticsearch.index.engine.ElasticsearchConcurrentMergeScheduler.doMerge(ElasticsearchConcurrentMergeScheduler.java:94)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

Following is the Cluster Info

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 122,
  "active_shards" : 122,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 99.1869918699187

Please help me out if possible. Also, If there is something else needed, Please let me know.

What sort of storage do you have on the node?

Elasticsearch performance is often limited by disk I/O and not necessarily CPU. Have a look at iostat -x on the host and see what throughput you are getting and what await looks like. If storage is the limit that is what you likely need to upgrade. The hotthreads output indicates that a lot of time is spent merging data on disk, which is a sign your storage is struggling.

I have Physical HDD Storage Drive on a bare metal server , Following are more details

NAME ROTA DISC-MAX MODEL
sda     1       0B PERC H730P Adp 

Following is the output for iostat -x

Linux 4.18.0-305.7.1.el8_4.x86_64 (Project) 	Friday 18 November 2022 	_x86_64_	(32 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          58.99   11.48    7.68    0.12    0.00   21.74

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda             34.52  819.47   7954.71 109505.04     0.01     5.81   0.02   0.70    9.52    0.45   0.23   230.43   133.63   0.29  24.54
dm-0             1.09    5.19     52.18    292.61     0.00     0.00   0.00   0.00   14.76    0.34   0.02    47.74    56.39   2.26   1.42
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    5.93    0.00   0.00    22.21     0.00   4.12   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00   79.56    0.00   0.00     0.00     0.00  80.00   0.00
dm-2             6.48  484.19   7902.52 109212.40     0.00     0.00   0.00   0.00   24.57    0.82   0.09  1219.66   225.56   0.48  23.51

For more clarity pasting screen shot of the same

My Storage is still 80 % available

Spinning disks are in my opinion not ideal for Elasticsearch use cases with high load and quickly becomes the bottleneck. Your disk utilisation is quite high and I believe this is what is limiting throughput. As pointed out in the docs around optimising for indexing throughput SSDs are recommended. I would also recommend you watch this video for some practical benchmarks and examples.

I am checking on the video
However the disk is only 10 % utilised . Therefore this doesn't seems to be a issue

I am checking on the video
However the disk is only 10 % utilised . Therefore this doesn't seems to be a issue

The hot threads indicates that a lot of time is spend merging, which generally is limited by disk IO performance. Note that you have very high wait times for reads. Merging involves reading segments from disk and merging these into larger segments that are then written to disk. The slow read performance will therefore affect merging, which is what I believe you are seeing in the hot threads.

I would therefore recommend switching to SSDs if you want to be able to support higher throughput.

It would be useful if you could provide the full output of the cluster nodes stats API as this contains stats around indexing and merging.

Thanks for your suggestion . We are already in stage of installing SSD but it may take some time . Before that I am making sure to get max indexing rate with current hardware .
Following is the output of iostat -d -x 5 (at 5 second interval ) And I don't see much await here , am I missing something

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00  587.20      0.00 115528.00     0.00     0.00   0.00   0.00    0.00    4.68   2.75     0.00   196.74   0.22  12.64
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00  199.60      0.00 115526.40     0.00     0.00   0.00   0.00    0.00    2.41   0.48     0.00   578.79   0.62  12.30

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00  862.00      0.00 185100.00     0.00     0.20   0.00   0.02    0.00    1.14   0.98     0.00   214.73   0.15  12.68
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00  235.60      0.00 185100.00     0.00     0.00   0.00   0.00    0.00    0.89   0.21     0.00   785.65   0.54  12.72

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00  711.80      0.00 142396.80     0.00     2.40   0.00   0.34    0.00    0.88   0.62     0.00   200.05   0.16  11.22
dm-0             0.00    0.80      0.00      5.60     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     7.00   0.25   0.02
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00  230.40      0.00 142391.20     0.00     0.00   0.00   0.00    0.00    0.53   0.12     0.00   618.02   0.49  11.24

Device            r/s     w/s     rkB/s     wkB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              0.00  785.80      0.00 159792.00     0.00    17.80   0.00   2.22    0.00    1.03   0.81     0.00   203.35   0.14  10.62
dm-0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-1             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
scd0             0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
dm-2             0.00  264.40      0.00 159792.00     0.00     0.00   0.00   0.00    0.00    0.67   0.18     0.00   604.36   0.40  10.62

In these samples there does not seem to be any reading going on. Does this coincide with the same type of hot threads output as shown above or is this taken diring a period where no merging is taking place?

Can you also please provide the stats I asked for?

Following is the cluster node stats API output

There is continuous indexing going on in elastic . So yes it was taken during same period where merging is taking place .

I have provided this through jumpshare as large json couldn't be pasted here . Below is the link

HI Christain @Christian_Dahlqvist any inference from Node stats API . TIA

"merges" : {
          "current" : 8,
          "current_docs" : 29438858,
          "current_size_in_bytes" : 18674056584,
          "total" : 39090,
          "total_time_in_millis" : 215926038,
          "total_docs" : 5762103382,
          "total_size_in_bytes" : 3782156596114,
          "total_stopped_time_in_millis" : 23637018,
          "total_throttled_time_in_millis" : 937811,
          "total_auto_throttle_in_bytes" : 218090462487
        },
        "refresh" : {
          "total" : 37434,
          "total_time_in_millis" : 4843805,
          "external_total" : 30365,
          "external_total_time_in_millis" : 4205694,
          "listeners" : 0
        },
        "flush" : {
          "total" : 7185,
          "periodic" : 6928,
          "total_time_in_millis" : 8036129
        },

You can see from the statistics that merging has been stopped and throttled, so that would probably explain your hot threads output.

Thanks For your response
Is there any good document to understand throttled time and stop time in merges