Hi,
I am trying to come up with tunning settings for my Logstash Indexer
It sits between Kafka and Elasticsearch (Pulling off of 1 topic called logstash with 25 partitions)
I have 12 nodes (2 client, 3 masters and the rest data nodes) - going to add 3 more data nodes over the next week
I am indexing 500M documents a day and probably will double that by the end of the year.
I had marvel installed and it said that my Indexing Latency was .7 MS (Really fast)
However, I can not seem to get rid of the Backlog playing around with the different Kafka input and elasticsearch output settings. (and number of Logstash instances 3-6 of them)
So,
I am looking for some calculation on what I should set each of the settings for or a framework on what may give me the best bang for the buck
here is a Metric graph of logstash's indexing rates per server over a couple of days
The big drops in spikes are me restarting with tunning changes
Please help me Obe-won, your my only hope
input {
kafka {
topic_id => "logstash"
zk_connect => "{{ Kafka_Zookeeper}}:{{ Zookeeper_Port}}"
rebalance_max_retries => 20
consumer_threads => 1
queue_size => 2000
decorate_events => true
consumer_id => "${HOSTNAME}"
}
}
#### I have no filters
#### Output is deploy via ansible template and the values get replaced for things like hosts
output {
#stdout { codec => rubydebug }
# template_filename is defined in the $ANSIBLE_HOME/roles/ls-indexers/vars/main.yml
if [type] =~ "heartbeat" {
elasticsearch {
hosts => [ {% for host in groups['es-data'] %}"{{ host }}:9210"{% if not loop.last %},{% endif %}{% endfor %} ]
document_id => "%{host}-%{type}"
index => "heartbeat"
workers => 1
sniffing => true
}
}else if [dst_index] =~ /.+/ {
###################################################################################################
### If the DST_INDEX IS SET
###################################################################################################
#### If the dst_index is dsg then the data must go to both Bay and Lat indexes
#### if it is not dsg handle normally
elasticsearch {
hosts => [ {% for host in groups['es-data'] %}"{{ host }}:9210"{% if not loop.last %},{% endif %}{% endfor %} ]
index => "%{dst_index}-%{+YYYY.MM.dd}"
template => "{{ template_filename }}"
template_overwrite => "true"
workers => 10
flush_size => 100
idle_flush_time => 5
sniffing => true
}
} else {
###################################################################################################
### All else fails send it to the unknown index
###################################################################################################
elasticsearch {
hosts => [ {% for host in groups['es-data'] %}"{{ host }}:9210"{% if not loop.last %},{% endif %}{% endfor %} ]
index => "unknown-%{+YYYY.MM.dd}"
template => "{{ template_filename }}"
template_overwrite => "true"
workers => 4
flush_size => 1000
idle_flush_time => 5
sniffing => true
}
}
}