I had an earlier post regarding query time, I wanted to start a separate thread about the load on my elasticsearch cluster when I have a large amount of incoming logs (for a couple hours in the morning). The query thread was examples during non-peak load.
I am indexing about 520GB of log files to elasticsearch a day. At this phase I am only keeping 24 hours of data (eventually the goal is 7 days).
During the peak hour of incoming data, the cluster is hammered.
Is there anything I can do to optimize the cluster?
Config:
16 GB heap size, 4 shards, 1 replica
Hardware:
4 node cluster, 24 CPU cores, 24 GB of memory
Here is the template I use:
"template" : "logstash*", "settings" : { "number_of_shards" : 4, "number_of_replicas" : 1, "index.cache.field.type" : "soft", "index.refresh_interval" : "5s", "index.store.compress.stored" : "true", "index.routing.allocation.total_shards_per_node" : 3 } Very high run queue: procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 29296 857568 115180 4170680 0 0 0 18536 2064 19307 5 0 94 0 0 16 0 29296 981432 115460 4045384 0 0 2 91284 26435 195012 32 6 61 1 0 25 0 29296 940864 115604 4086892 0 0 2 6410 19264 148894 24 4 72 0 0 26 0 29296 937804 115740 4089044 0 0 0 7610 20409 152666 24 4 72 0 0 13 0 29296 921072 115864 4108016 0 0 0 29050 19789 151698 23 4 72 0 0 10 0 29296 899636 116060 4128760 0 0 0 8922 22611 178752 29 5 66 0 0 27 0 29296 803672 116272 4223260 0 0 1300 21616 9254 59491 14 2 84 0 0 12 0 29296 703440 116476 4324696 0 0 1260 8730 21620 164412 34 5 61 0 0 2 0 29296 723592 116756 4303752 0 0 394 46396 20529 149679 27 5 68 0 0 1 0 29296 812524 117040 4215268 0 0 6 89006 30665 224822 35 7 57 1 0 23 0 29296 811320 117248 4215140 0 0 0 16118 16144 129557 20 3 76 0 0 5 3 29296 793556 117440 4230480 0 0 92 13534 17697 130477 21 3 75 0 0 18 0 29296 791652 117664 4234996 0 0 0 25726 15064 105674 16 3 81 0 0 0 0 29296 767412 117864 4257892 0 0 2 7026 23563 185956 29 5 66 0 0 32 0 29296 698344 118092 4325644 0 0 0 24436 18761 135696 26 4 70 0 0 25 0 29296 688636 118484 4333708 0 0 0 19960 21589 169049 28 5 67 0 0 16 0 29296 641116 118756 4381596 0 0 2 28256 19404 151200 27 4 68 0 0 16 0 29296 598248 118960 4425428 0 0 0 24886 20111 154420 26 4 70 0 0 0 0 29296 684804 119228 4336856 0 0 2 51210 19501 145059 23 4 72 0 0 3 0 29296 657820 119436 4351960 0 0 24 29936 21593 160447 27 5 68 0 0 10 0 29296 649772 119680 4368648 0 0 2 8284 20268 149946 23 5 72 0 0 24 0 29296 575948 119888 4443508 0 0 2 8762 19982 156834 31 5 64 0 0 12 0 29296 528372 120108 4490592 0 0 0 29946 15819 104973 19 3 77 1 0 23 0 29296 525860 120308 4495436 0 0 0 15698 21515 163041 27 5 69 0 0 i.e. - High load and CPU usage by the elasticsearch java process top - 07:15:34 up 124 days, 13:04, 1 user, load average: 14.57, 12.50, 9.80 Tasks: 929 total, 1 running, 928 sleeping, 0 stopped, 0 zombie Cpu0 : 36.3%us, 4.0%sy, 0.0%ni, 59.1%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu1 : 32.5%us, 7.3%sy, 0.0%ni, 54.3%id, 6.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 33.0%us, 3.6%sy, 0.0%ni, 63.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 34.8%us, 5.9%sy, 0.0%ni, 55.1%id, 3.9%wa, 0.0%hi, 0.3%si, 0.0%st Cpu4 : 36.6%us, 4.3%sy, 0.0%ni, 59.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 32.9%us, 5.6%sy, 0.0%ni, 59.9%id, 1.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu6 : 32.7%us, 5.3%sy, 0.0%ni, 60.1%id, 1.7%wa, 0.0%hi, 0.3%si, 0.0%st Cpu7 : 24.0%us, 22.7%sy, 0.0%ni, 51.6%id, 1.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu8 : 33.8%us, 5.6%sy, 0.0%ni, 59.9%id, 0.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 36.2%us, 10.5%sy, 0.0%ni, 45.7%id, 6.2%wa, 0.0%hi, 1.3%si, 0.0%st Cpu10 : 47.2%us, 5.0%sy, 0.0%ni, 47.5%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 33.4%us, 15.6%sy, 0.0%ni, 48.3%id, 2.0%wa, 0.0%hi, 0.7%si, 0.0%st Cpu12 : 37.4%us, 5.3%sy, 0.0%ni, 57.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu13 : 34.5%us, 7.9%sy, 0.0%ni, 54.3%id, 3.0%wa, 0.0%hi, 0.3%si, 0.0%st Cpu14 : 64.7%us, 4.6%sy, 0.0%ni, 30.4%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 30.1%us, 15.9%sy, 0.0%ni, 50.7%id, 3.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu16 : 38.7%us, 5.0%sy, 0.0%ni, 56.0%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu17 : 54.5%us, 4.6%sy, 0.0%ni, 36.3%id, 4.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu18 : 39.8%us, 4.9%sy, 0.0%ni, 54.6%id, 0.3%wa, 0.0%hi, 0.3%si, 0.0%st Cpu19 : 34.1%us, 7.9%sy, 0.0%ni, 55.0%id, 2.6%wa, 0.0%hi, 0.3%si, 0.0%st Cpu20 : 35.7%us, 11.8%sy, 0.0%ni, 49.5%id, 2.6%wa, 0.0%hi, 0.3%si, 0.0%st Cpu21 : 45.4%us, 10.2%sy, 0.0%ni, 34.5%id, 5.3%wa, 1.0%hi, 3.6%si, 0.0%st Cpu22 : 38.9%us, 5.6%sy, 0.0%ni, 52.8%id, 2.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu23 : 28.9%us, 10.5%sy, 0.0%ni, 33.9%id, 0.0%wa, 3.3%hi, 23.4%si, 0.0%st Mem: 24675936k total, 24601936k used, 74000k free, 19960k buffers Swap: 4192880k total, 29296k used, 4163584k free, 5052000k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17248 user 17 0 18.9g 17g 10m S 1089.8 74.3 2850:30 java