Hello,
I have an application which generates ~50 files/minute with 10000 events (monoline). Previously, I read these files with logstash, processed them and sent them to Elasticsearch. Unfortunately, I was CPU-bound, so I bought fresh new servers dedicated for logstash.
Now, I have:
- 1 app server which generates events and send them to logstash with filebeat
- Another app server (same hardware) which generates events and send them to elasticsearch (via a local logstash 2.4 instance with file input)
- 1 logstash server (5.2 or 2.4, I tested both) which send events to elasticsearch.
Elasticsearch cluster is 5.1.
Once created, my log files are never updated.
The issue is that performances are far worse with the filebeat/logstash configuration than with the file input/logstash configuration (~9000 event/second vs 5000 event/s)
I don't have any RAM/CPU/IO/network issue on the filebeat/logstash stack.
With the following configuration, I send ~100 000 events in 30s (~3300 event/s). My goal is to achieve between 20000 and 30000 event/s on the filebeat/logstash part.
filebeat.prospectors:
- document_type: netflow_in
ignore_older: 1h
input_type: log
paths: [/tmp/FLOW_IN/*/*/*/*/*.flows]
scan_frequency: 1s
harvester_buffer_size: 131072
close_eof: true
clean_removed: true
close_inactive: 10m
- document_type: netflow_out
input_type: log
ignore_older: 1h
paths: [/tmp/FLOW_OUT/*/*/*/*/*.flows]
scan_frequency: 1s
harvester_buffer_size: 131072
close_eof: true
clean_removed: true
close_inactive: 10m
logging.files: {keepfiles: 3, name: filebeat, path: /var/log/filebeat, rotateeverybytes: 10485760}
logging.level: info
logging.metrics.enabled: true
logging.metrics.period: 30s
logging.to_files: true
output.logstash:
hosts: ['elkprod-netflow-logstash-tc2.priv.sewan.fr:10010']
filebeat.spool_size: 16384
queue_size: 2000
Queue size, spool_size, harvester_size don't seem to have any effect at all.
Here is what I get:
2017-03-01T18:57:05+01:00 INFO Non-zero metrics in the last 30s: registrar.states.cleanup=370 registrar.states.update=92070 publish.events=81920 filebeat.harvester.open_files=1053 filebeat.harvester.running=1054 libbeat.logstash.call_count.PublishEvents=44 libbeat.logstash.publish.read_bytes=258 registar.states.current=7444 libbeat.logstash.published_and_acked_events=81902 libbeat.publisher.published_events=82813 filebeat.harvester.started=1054 libbeat.logstash.publish.write_bytes=5670331 registrar.writes=6
When I loadbalance on 4 ports on my 2.4 logstash, with 10 filebeat workers (I am really not CPU-bound, so...), it's slightly better, but far from what I'd expect
2017-03-01T18:58:56+01:00 INFO Non-zero metrics in the last 30s: registar.states.current=31 libbeat.logstash.published_and_acked_events=147435 libbeat.logstash.publish.read_errors=7 libbeat.logstash.published_but_not_acked_events=14333 libbeat.logstash.publish.write_bytes=10246957 libbeat.logstash.call_count.PublishEvents=87 publish.events=147456 registrar.writes=10 filebeat.harvester.running=24 filebeat.harvester.started=24 filebeat.harvester.open_files=24 registrar.states.update=160691 libbeat.logstash.publish.read_bytes=846 libbeat.publisher.published_events=163815
My logstash conf:
input {
beats {
port => 10010
}
beats {
port => 10011
}
beats {
port => 10012
}
beats {
port => 10013
}
}
filter {
drop {}
}
output{
elasticsearch {
index => "logstash_netflow_client_v12-%{+YYYY.MM.dd}"
hosts => ["http://host1",http://host2, http://host3,http://host4, http://host5,http://host6, http://host7, http://host8]
flush_size => 250
workers => 40
}
Logstash is launched with -b250. As you can see, in order to be sure that I don't have any issue with Elasticsearch, I just drop anything.
On both servers, I have 70% idle CPU and no iowait. I disabled swap too, and still have at least half my RAM.
I really fail to see the issue there.
Do you have an idea how could I troubleshoot that furthermore. Is there an issue with the metrics shown by Filebeat ?
Thank you,
Regards,
Grégoire