We've setup an ELK stack in our company but recently we've been experiencing some issues during the morning ramp-up on our filebeats. The problem is as follows:
Filebeat can't keep up with the logs, at least that's what I can see in the monitoring, initially it's sending above 500 events per second but when this value should increase because the logs increase it starts to decline to a value between 200 & 0 events per second. I tried looking at some configuration options but the only thing I could find is when there are multiple files but in this case there's only one file (haproxy.log) so I can't increase the scan_frequency or other similar options.
Filebeat can only send data as fast as the downstream systems can accept it. what does your ingest architecture look like? Are you sending data to Elasticsearch? If so, what is the specification of your Elasticsearch cluster?
No, everything is being sent to a logstash server which then parses the logs
This is one logstash server with 10 CPU cores and 8GB of memory, i've assigned 48 (too much but still no issues) pipeline workers to the machine and a JVM Heap of 4GB.
Initially I thought this was a logstash issue but the problem is that nor the JVM heap usage is more than 75% and the CPU is never used over 50%
Both the servers have 4 cpu cores & 16 GB of memory
They have a combined JVM heap of 20GB and an indexing rate of 1000 events/second at night and goes up to 3000-4000 events per second with a latency of 0.1 to 0.15 ms
The maximum usage of JVM heap I've seen on the nodes is about 15GB so 75%
I do see a fall in indexing rate during the issues but I assumed that the cause was because filebeat doesn't send data
Is there anything in the Elasticsearch logs indicating any problems? Are both nodes master-eligible? If so, have you set minimum_master_nodes to 2 to prevent split brain scenarios?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.