Hot-Warm architecture - ES high CPU usage


(Roman) #1

Hello,

We are using ES for logging, logs are coming to ES in real time.
To speed up logs searching and total logs capacity we using ES hot-warm architecture:
6 hot nodes (4 vCPUs, 26 GB memory, 500GB ssd )
6 warm nodes (4 vCPUs, 26 GB memory, 3000GB hdd ) | Indices on warm nodes forcemerged to 1 segment.
5 primary shards and 1 replica - default ES parameters for new indices. Daily index ~250GB.
Java - openjdk version "1.8.0_161"
ES version 5.6.8

Main problem now it's consistently high CPU usage on hot nodes:

Heap usage on one of the node (heap size 1/2 of RAM ):

Hot threads:

   85.6% (428.2ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][refresh][T#1]'
   89.1% (445.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
   89.4% (447.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
   90.4% (452ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1522]'
   100.0% (500.1ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
   99.3% (496.6ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1520]'
   100.1% (500.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1506]'
   87.5% (437.5ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1522]'
   100.2% (500.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot002][[logstash-2018.04.24][0]: Lucene Merge Thread #1502]'
   92.7% (463.7ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot003][[logstash-2018.04.24][0]: Lucene Merge Thread #1515]'
   85.8% (428.8ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][refresh][T#2]'
   88.9% (444.3ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot001][[logstash-2018.04.24][1]: Lucene Merge Thread #1512]'
   93.1% (465.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot006][[logstash-2018.04.23][1]: Lucene Merge Thread #2289]'
   82.6% (413ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][2]: Lucene Merge Thread #1524]'
   90.1% (450.4ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot004][[logstash-2018.04.24][3]: Lucene Merge Thread #1490]'
   100.0% (499.9ms out of 500ms) cpu usage by thread 'elasticsearch[elasticsearch-hot005][[logstash-2018.04.24][3]: Lucene Merge Thread #1516]'

Almost all the time it's a merge threads and it's on ssd disks!

(Christian Dahlqvist) #2

How many indices and shards are you actively indexing into?

Are you indexing immutable documents or do you perform updates? Are you using nested documents and/or parent-child relationships?

Have you gone through these tuning recommendations?

What does disk I/O and iowait look like on the nodes?

Are you using locally attached SSD storage?


(OlegBB) #3

Work with Roman. For 6 hot nods we are using 7 indexes ( 1 index per day ) . 5 shard +1 replica for each index. 70 shards for 6 hot type nodes.
We using immutable documents .
We are planing go through recommendations.

  • refresh interval for now is 5s
  • swap is disabled
    We are using massive SSD persistent disk in GCE ( not locally attached, but this is not an NFS )


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.