Hello,
We are using ES for logging, logs are coming to ES in real time.
To speed up logs searching and total logs capacity we using ES hot-warm architecture:
6 hot nodes (4 vCPUs, 26 GB memory, 500GB ssd )
6 warm nodes (4 vCPUs, 26 GB memory, 3000GB hdd ) | Indices on warm nodes forcemerged to 1 segment.
5 primary shards and 1 replica - default ES parameters for new indices. Daily index ~250GB.
Java - openjdk version "1.8.0_161"
ES version 5.6.8
Main problem now it's consistently high CPU usage on hot nodes:
Heap usage on one of the node (heap size 1/2 of RAM ):
Hot threads:
85.6% (428.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][refresh][T#1]'
89.1% (445.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
89.4% (447.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
90.4% (452ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1522]'
100.0% (500.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
99.3% (496.6ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1520]'
100.1% (500.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
87.5% (437.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1522]'
100.2% (500.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
92.7% (463.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot003][[logstash-2018.04.24][0]: Lucene Merge Thread #1515]'
85.8% (428.8ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][refresh][T#2]'
88.9% (444.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
93.1% (465.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot006][[logstash-2018.04.23][1]: Lucene Merge Thread #2289]'
82.6% (413ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1524]'
90.1% (450.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot004][[logstash-2018.04.24][3]: Lucene Merge Thread #1490]'
100.0% (499.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][3]: Lucene Merge Thread #1516]'
Almost all the time it's a merge threads and it's on ssd disks!