I am doing log analysis using Filebeat (1.2) -> logstash(2.3) -> Elasticsearch (2.3)
I have 4 filebeat instances, 4 logstash instances (6 cores each) , and Elasticsearch cluster (8 cores, 64G RAM) of 2 nodes
In Filebeat.yml logstash setting is as such:
filebeat:
prospectors:
-
paths:
- /var/log/filebeat//.json
encoding: utf-8
input_type: log
ignore_older: 10m
scan_frequency: 1s
exclude_lines: ["^$"]
spool_size: 3072
registry_file: .filebeat
output:
logstash:
enabled: true
hosts: ["logstash1:5044","logstash2:5044"]
worker: 8
loadbalance: true
index: elkstats_record
Logstash’s Elasticsearch output setting is like this
elasticsearch {
hosts => ["node1", "node2"]
index => "records_%{+YYYY.MM.dd}" # generate 1 index every month
template_name => "template"
document_id => "%{[@metadata][computed_id]}" # set documented
workers => 2
flush_size => 3500
}
Logstash host environment variable for heap size is set to $LS_HEAP_SIZE = 2048M
Normally, if the traffic is 3000-4000 msg/sec, when the traffic exceeds 6000-7000 msg/sec, logstash side will have these repeated messages:
CircuitBreaker::rescuing exceptions {:name=>"Beats input", :exception=>LogStash::Inputs::Beats::InsertingToQueueTakeTooLong, :level=>:warn}
Beats input: The circuit breaker has detected a slowdown or stall in the pipeline, the input is closing the current connection and rejecting new connection until the pipeline recover. {:exception=>LogStash::Inputs::BeatsSupport::CircuitBreaker::HalfOpenBreaker, :level=>:warn}
In system monitor tool, it shows logstash instance uses 50 - 80% of the cpu during the high traffic hours, but only 800MB RAM.
Elasticsearch side does not have obvious resource shortage. CPU usage is 15%-26%, memory usage is 26%
On elasticsearch error logs, it shows: org.apache.lucene.store.AlreadyClosedException, I have more details in this link
It looks like logstash is not processing messages fast enough, and filebeat takes long time to insert to logstash's queue. Then we end up losing the messages which are not inserted from filebeat to logstash queue.
Is there any way to tune logstash or filebeat (flush_size, workers) so that Logstash will process messages faster?
Also how to let logstash make use of all the LS_HEAP_SIZE of 2G to cache the unprocessed messages, instead of only 800MB?
Is there any other way to prevent logstash from losing messages?