Cluster CPU usage

I had an earlier post regarding query time, I wanted to start a separate thread about the load on my elasticsearch cluster when I have a large amount of incoming logs (for a couple hours in the morning). The query thread was examples during non-peak load.



I am indexing about 520GB of log files to elasticsearch a day. At this phase I am only keeping 24 hours of data (eventually the goal is 7 days).



During the peak hour of incoming data, the cluster is hammered.



Is there anything I can do to optimize the cluster?



Config:
16 GB heap size, 4 shards, 1 replica



Hardware:
4 node cluster, 24 CPU cores, 24 GB of memory



Here is the template I use:


"template" : "logstash*",
    "settings" : {
        "number_of_shards" : 4,
        "number_of_replicas" : 1,
        "index.cache.field.type" : "soft",
        "index.refresh_interval" : "5s",
        "index.store.compress.stored" : "true",
        "index.routing.allocation.total_shards_per_node" : 3
    }

Very high run queue:

procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
 r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
 0  0      29296     857568     115180    4170680    0    0     0 18536 2064 19307   5  0  94  0  0
16  0      29296     981432     115460    4045384    0    0     2 91284 26435 195012  32  6  61  1  0
25  0      29296     940864     115604    4086892    0    0     2  6410 19264 148894  24  4  72  0  0
26  0      29296     937804     115740    4089044    0    0     0  7610 20409 152666  24  4  72  0  0
13  0      29296     921072     115864    4108016    0    0     0 29050 19789 151698  23  4  72  0  0
10  0      29296     899636     116060    4128760    0    0     0  8922 22611 178752  29  5  66  0  0
27  0      29296     803672     116272    4223260    0    0  1300 21616 9254 59491  14  2  84  0  0
12  0      29296     703440     116476    4324696    0    0  1260  8730 21620 164412  34  5  61  0  0
 2  0      29296     723592     116756    4303752    0    0   394 46396 20529 149679  27  5  68  0  0
 1  0      29296     812524     117040    4215268    0    0     6 89006 30665 224822  35  7  57  1  0
23  0      29296     811320     117248    4215140    0    0     0 16118 16144 129557  20  3  76  0  0
 5  3      29296     793556     117440    4230480    0    0    92 13534 17697 130477  21  3  75  0  0
18  0      29296     791652     117664    4234996    0    0     0 25726 15064 105674  16  3  81  0  0
 0  0      29296     767412     117864    4257892    0    0     2  7026 23563 185956  29  5  66  0  0
32  0      29296     698344     118092    4325644    0    0     0 24436 18761 135696  26  4  70  0  0
25  0      29296     688636     118484    4333708    0    0     0 19960 21589 169049  28  5  67  0  0
16  0      29296     641116     118756    4381596    0    0     2 28256 19404 151200  27  4  68  0  0
16  0      29296     598248     118960    4425428    0    0     0 24886 20111 154420  26  4  70  0  0
 0  0      29296     684804     119228    4336856    0    0     2 51210 19501 145059  23  4  72  0  0
 3  0      29296     657820     119436    4351960    0    0    24 29936 21593 160447  27  5  68  0  0
10  0      29296     649772     119680    4368648    0    0     2  8284 20268 149946  23  5  72  0  0
24  0      29296     575948     119888    4443508    0    0     2  8762 19982 156834  31  5  64  0  0
12  0      29296     528372     120108    4490592    0    0     0 29946 15819 104973  19  3  77  1  0
23  0      29296     525860     120308    4495436    0    0     0 15698 21515 163041  27  5  69  0  0

i.e. - High load and CPU usage by the elasticsearch java process

top - 07:15:34 up 124 days, 13:04,  1 user,  load average: 14.57, 12.50, 9.80
Tasks: 929 total,   1 running, 928 sleeping,   0 stopped,   0 zombie
Cpu0  : 36.3%us,  4.0%sy,  0.0%ni, 59.1%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  : 32.5%us,  7.3%sy,  0.0%ni, 54.3%id,  6.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  : 33.0%us,  3.6%sy,  0.0%ni, 63.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  : 34.8%us,  5.9%sy,  0.0%ni, 55.1%id,  3.9%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu4  : 36.6%us,  4.3%sy,  0.0%ni, 59.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  : 32.9%us,  5.6%sy,  0.0%ni, 59.9%id,  1.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu6  : 32.7%us,  5.3%sy,  0.0%ni, 60.1%id,  1.7%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu7  : 24.0%us, 22.7%sy,  0.0%ni, 51.6%id,  1.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu8  : 33.8%us,  5.6%sy,  0.0%ni, 59.9%id,  0.7%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu9  : 36.2%us, 10.5%sy,  0.0%ni, 45.7%id,  6.2%wa,  0.0%hi,  1.3%si,  0.0%st
Cpu10 : 47.2%us,  5.0%sy,  0.0%ni, 47.5%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu11 : 33.4%us, 15.6%sy,  0.0%ni, 48.3%id,  2.0%wa,  0.0%hi,  0.7%si,  0.0%st
Cpu12 : 37.4%us,  5.3%sy,  0.0%ni, 57.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu13 : 34.5%us,  7.9%sy,  0.0%ni, 54.3%id,  3.0%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu14 : 64.7%us,  4.6%sy,  0.0%ni, 30.4%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu15 : 30.1%us, 15.9%sy,  0.0%ni, 50.7%id,  3.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu16 : 38.7%us,  5.0%sy,  0.0%ni, 56.0%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu17 : 54.5%us,  4.6%sy,  0.0%ni, 36.3%id,  4.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu18 : 39.8%us,  4.9%sy,  0.0%ni, 54.6%id,  0.3%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu19 : 34.1%us,  7.9%sy,  0.0%ni, 55.0%id,  2.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu20 : 35.7%us, 11.8%sy,  0.0%ni, 49.5%id,  2.6%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu21 : 45.4%us, 10.2%sy,  0.0%ni, 34.5%id,  5.3%wa,  1.0%hi,  3.6%si,  0.0%st
Cpu22 : 38.9%us,  5.6%sy,  0.0%ni, 52.8%id,  2.6%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu23 : 28.9%us, 10.5%sy,  0.0%ni, 33.9%id,  0.0%wa,  3.3%hi, 23.4%si,  0.0%st
Mem:  24675936k total, 24601936k used,    74000k free,    19960k buffers
Swap:  4192880k total,    29296k used,  4163584k free,  5052000k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
17248 user     17   0 18.9g  17g  10m S 1089.8 74.3   2850:30 java