We have being trying to import our logs coming from a Cisco ASA into the ElasticSearch using Filebeat.
The Cisco ASA is sending an average of 8,000 events/seconds. At first we were importing them as syslog listening on UDP 514, but the rate was too slow so as a temporary solution,we are logging the events to disk with rsyslogd, and Filebeat reads the events off that file.
When the Cisco module is disabled, monitoring in Kibana shows an average events rate of 8,000/s, but when we enable the Cisco module, the rate lowers down to 500/s.
I have tried to tweak the settings of filebeat.yml to increase this rate, like bulk_max_size and worker. Changing worker makes a small difference, but the value of bulk_max_size has no visible impact.
This is running on a VM with 64GB of RAM, and 24 cores.
Here's the filebeat configuration file :
Is this really so much for a modern computer ? The volume of log data coming from the ASA firewall is only about 1 MByte/sec. It's being stored on a non-SSD hard drive, but this happens after the Cisco parsing step, which appears to be the bottleneck as it works fine when the Cisco module is disabled.
I thought of using Logstash, but I haven't tried it yet.
When the Cisco module is disabled I assume the event contains very few fields, which is easier to index than more complex events. The fact that one works and the other does not does not necessarily remove Elasticsearch as a potential bottleneck, especially as you seem to be using spinning disks. What does disk I/O and iowait look like when you are indexing enriched events?
It is running around 20MB/s with the module enabled and the iowait was very low (around 0.04%). We performed a test by copying a very big file and the resulting i/o was around 200MB/s.
I can still try the Logstash solution like @grumo35 suggested, but that doesn't really explain why changing bulk_max_size stops having an impact on the indexing rate once the Cisco module is enabled.
Copying a very large file is a poor way to judge how well a certain type of storage will work with Elasticsearch as it involved a lot of sequential read and writes while Elasticsearch generally uses random reads and writes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.