Elasticsearch cluster queues spike constanly, cpu/load increasing and mostly looking for advice on optimization, not looking for a single answer though.
Main indices constantly being updated (bulk) with 300s refresh interval.
CPU usage
07:37:48 PM CPU %user %nice %system %iowait %steal %idle
07:37:53 PM all 74.38 0.00 1.27 0.00 0.06 24.29
Load
19:38:26 up 18 days, 8:08, 3 users, load average: 20.09, 16.88, 16.08
Memory
$ free -g
total used free shared buff/cache available
Mem: 119 30 1 0 88 88
Swap: 0 0 0
Biggest indices
512 shards
Avg shard size 3-4gb
Total size 4.1tb
64 shards
Avg shard size 3gb
Total size 408gb
Kibana/X-pack
Thread pool
id host name active queue rejected completed
GSGumFBBRZS0bQ0aoVMsF 10.201.20.89 search 25 89 0 827820578
0sjcw6fRQDmKjMABkBoFM 10.201.20.195 search 5 0 0 1301852541
ORc35jDwSAa23gYKwqxGA 10.201.20.116 search 12 0 0 830988743
mjBEn8OETA-vQGrhK0W8X 10.201.20.61 search 16 2 61 1258009280
0TPNIdweSPunxXpehP0WK 10.201.20.169 search 23 3 0 876063395
kHG_ZXegTBOo5UaWFCEH0 10.201.20.98 search 12 1 294 930992459
BCCfQTZVR96dsfgYguRdD 10.201.20.235 search 25 22 113 878374510
T_YjqiNhTyOZ16DX3YqwC 10.201.20.254 search 25 20 157 902678205
WsDzh87ET6mFZFIUM-KOw 10.201.20.166 search 15 2 0 869113390
pnQCehRdTy6Brq7NAHEXG 10.201.20.148 search 3 0 0 409167893
uj8dWi4JSnyPXRzjA0xJm 10.201.20.60 search 21 1 0 49490497
B5Z4mY2nTFWv8xNqgl1O_ 10.201.20.19 search 11 0 0 358465691
DFWMHq_hRcaheA6EbYwAy 10.201.20.213 search 25 129 0 49533248
0J3CoHe0RK-g42BHVOAso 10.201.20.18 search 16 0 125 1306944101
seW_iSbwRUahXXRGmHBhL 10.201.20.105 search 5 0 0 58982461
Mxuo0aZaTVKCu4DA7pguY 10.201.20.57 search 20 3 11 871736991
nV0jd9_9SBazFJBMcW6fN 10.201.20.107 search 25 17 257 1301609159
lPRTFUWLSoqfy5WyNVIhq 10.201.20.228 search 25 24 0 58576149
dViqleV_TRCMCjY_10drG 10.201.20.219 search 11 0 0 911538587
Qu52J_ZuTgO5VK_jb7hyH 10.201.20.252 search 10 0 757 1337745981
Specs
AWS i3.4xlarge
20 nodes (all data nodes)
120gb RAM memory
16cpus
Config file elastic
bootstrap.memory_lock: true
cluster.name: elasticsearch-prod
discovery.ec2.host_type: private_ip
discovery.ec2.tag.ESCluster: elasticsearch-prod
discovery.zen.hosts_provider: ec2
http.cors.allow-origin : "*"
http.cors.enabled : true
indices.fielddata.cache.size: "50%"
network.host: "_ec2_"
node.data: true
node.ingest: true
node.master: true
node.name: elasticsearch-prod-node-i-00f084a45088465db
path.data: /elasticsearch/data
path.logs: /var/log/elasticsearch
xpack.security.enabled: false
http.port: 9200
Config file JVM
-Dfile.encoding=UTF-8
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djdk.io.permissionsUseCanonicalPath=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j.skipJansi=true
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+DisableExplicitGC
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-Xms26g
-Xmx26g
-Xss1m
-server
Health
{
"cluster_name" : "elasticsearch-prod",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 20,
"number_of_data_nodes" : 20,
"active_primary_shards" : 629,
"active_shards" : 1258,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
OS
CentOS Linux release 7.5.1804 (Core)
3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
RPM:
elasticsearch-6.4.2.rpm
Details:
elasticsearch-6.4.2-1.noarch
JAVA:
openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
Any insight greatly appreciated!