Hot-Warm architecture - ES high CPU usage

Hello,

We are using ES for logging, logs are coming to ES in real time.
To speed up logs searching and total logs capacity we using ES hot-warm architecture:
6 hot nodes (4 vCPUs, 26 GB memory, 500GB ssd )
6 warm nodes (4 vCPUs, 26 GB memory, 3000GB hdd ) | Indices on warm nodes forcemerged to 1 segment.
5 primary shards and 1 replica - default ES parameters for new indices. Daily index ~250GB.
Java - openjdk version "1.8.0_161"
ES version 5.6.8

Main problem now it's consistently high CPU usage on hot nodes:

Heap usage on one of the node (heap size 1/2 of RAM ):

Hot threads:

   85.6% (428.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][refresh][T#1]'
   89.1% (445.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
   89.4% (447.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
   90.4% (452ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1522]'
   100.0% (500.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
   99.3% (496.6ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1520]'
   100.1% (500.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
   87.5% (437.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1522]'
   100.2% (500.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
   92.7% (463.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot003][[logstash-2018.04.24][0]: Lucene Merge Thread #1515]'
   85.8% (428.8ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][refresh][T#2]'
   88.9% (444.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
   93.1% (465.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot006][[logstash-2018.04.23][1]: Lucene Merge Thread #2289]'
   82.6% (413ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1524]'
   90.1% (450.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot004][[logstash-2018.04.24][3]: Lucene Merge Thread #1490]'
   100.0% (499.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][3]: Lucene Merge Thread #1516]'

Almost all the time it's a merge threads and it's on ssd disks!

How many indices and shards are you actively indexing into?

Are you indexing immutable documents or do you perform updates? Are you using nested documents and/or parent-child relationships?

Have you gone through these tuning recommendations?

What does disk I/O and iowait look like on the nodes?

Are you using locally attached SSD storage?

Work with Roman. For 6 hot nods we are using 7 indexes ( 1 index per day ) . 5 shard +1 replica for each index. 70 shards for 6 hot type nodes.
We using immutable documents .
We are planing go through recommendations.

  • refresh interval for now is 5s
  • swap is disabled
    We are using massive SSD persistent disk in GCE ( not locally attached, but this is not an NFS )

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.