Hello,
I have 20 node platinum licenced ELK cluster. (Version 7.6.1)
Setup is stated as below.
16 data nodes --> each 64GB memory // 32gb allocated as Xms-Xmx jvm . --> 8TB disk each
3 master nodes --> 32gb memory each
1 Ml node --> 64 gb memory
My problem is i checked most of the index tuning advisory and threadpool write rejection cases somehow my rejections started to increase for a while , 2 nodes nearly have %5 rejection rate.
My indexing rates are like as stated below , the spikes are huge but i get data from Database, syslog, log files and directly forwarded to my elasticsearch from fluentbit or logstash of some other services all sources have different behaviour.
My most used index template is as below. So most of the indexes are 4 primary and 1 replica.
Refresh interval may be better if higher value i guess.
> {
> "main_temp" : {
> "order" : 0,
> "index_patterns" : [
> "csm_siem_record_*",
> "csm_siem_restored_*"
> ],
> "settings" : {
> "index" : {
> "codec" : "best_compression",
> "refresh_interval" : "5s",
> "analysis" : {
> "filter" : {
> "ntk_asciifolding" : {
> "type" : "asciifolding",
> "preserve_original" : "false"
> },
> "ntk_turkce_lowercase" : {
> "type" : "lowercase",
> "language" : "turkish"
> }
> },
> "analyzer" : {
> "raw_log" : {
> "filter" : [
> "ntk_turkce_lowercase",
> "ntk_asciifolding"
> ],
> "type" : "custom",
> "tokenizer" : "classic"
> }
> }
> },
> "number_of_shards" : "4",
> "auto_expand_replicas" : "0-1",
> "query" : {
> "default_field" : "raw_record"
> }
> }
> },
> "mappings" : {
> "dynamic_templates" : [
> {
> "notanalyzed" : {
> "mapping" : {
> "type" : "keyword"
> },
> "match_mapping_type" : "string",
> "match" : "*"
> }
> }
> ],
> "date_detection" : false,
> "properties" : {
> "received_bytes" : {
> "type" : "long"
> },
> "destination_port" : {
> "type" : "integer"
> },
> "jvbrtt" : {
> "type" : "short"
> },
> "sent_packets" : {
> "type" : "long"
> },
> "responseData" : {
> "ignore_above" : 8191,
> "type" : "keyword"
> },
> "translated_source_ip" : {
> "type" : "ip"
> },
> "raw_record" : {
> "analyzer" : "raw_log",
> "type" : "text"
> },
> "packets" : {
> "type" : "long"
> },
> "session_time" : {
> "type" : "long"
> },
> "source_ip" : {
> "type" : "ip"
> },
> "sent_bytes" : {
> "type" : "long"
> },
> "download" : {
> "type" : "short"
> },
> "duration" : {
> "type" : "long"
> },
> "destination_ip" : {
> "type" : "ip"
> },
> "translated_destination_ip" : {
> "type" : "ip"
> },
> "translated_source_port" : {
> "type" : "integer"
> },
> "date_time" : {
> "format" : "strict_date_optional_time||epoch_millis",
> "type" : "date"
> },
> "translated_destination_port" : {
> "type" : "integer"
> },
> "source_port" : {
> "type" : "integer"
> },
> "connectionquality" : {
> "type" : "float"
> },
> "responsetime" : {
> "type" : "keyword"
> },
> "issuccessful" : {
> "type" : "keyword"
> },
> "requestData" : {
> "ignore_above" : 8191,
> "type" : "keyword"
> },
> "received_packets" : {
> "type" : "long"
> },
> "severity" : {
> "type" : "integer"
> },
> "responsecode" : {
> "type" : "keyword"
> },
> "coordinate" : {
> "type" : "geo_point"
> },
> "destination_ip_coordinate" : {
> "type" : "geo_point"
> },
> "count" : {
> "type" : "long"
> },
> "maxframeheight" : {
> "type" : "integer"
> },
> "translated_ip" : {
> "type" : "ip"
> },
> "upload" : {
> "type" : "short"
> },
> "resp_time" : {
> "type" : "integer"
> },
> "message" : {
> "analyzer" : "raw_log",
> "type" : "text"
> },
> "isservererror" : {
> "type" : "keyword"
> },
> "sign_time" : {
> "format" : "strict_date_optional_time||epoch_millis",
> "type" : "date"
> },
> "device_ip" : {
> "type" : "ip"
> },
> "source_ip_coordinate" : {
> "type" : "geo_point"
> },
> "upload__1" : {
> "type" : "short"
> },
> "download__1" : {
> "type" : "short"
> },
> "bytes" : {
> "type" : "long"
> },
> "starttime" : {
> "type" : "keyword"
> },
> "lastn" : {
> "type" : "short"
> },
> "email_subject" : {
> "analyzer" : "raw_log",
> "type" : "text"
> }
> }
> },
> "aliases" : { }
> }
> }
I will be really glad if you have further recommendations.