After a cluster crash, my ingest rates have been inconsistent from a filebeat to kafka to logstash data flow. Restarting filebeat seems to return it to a normal rate of log ingest for about 40 minutes, then it proceeds to dip to a low of about 30k events per 30minutes and another high of 120k per 30 minutes, repeatedly, each cycle the low and high decrease until they are nonexistent, then slowly increase back up to the original high/low combination.
The previous rate averaged at about 400k events per 30 minutes
I have no indication that logstash or elasticsearch is bottlenecking this setup, logstash's persistent queue is relatively small, and emptied at a consistent rate. Kafka shows that logstash is easily able to keep up with the rate of data being passed into kafka.
Our setup currently has 6 logstash VMs pulling from 3 non-clustered kafka instances, with 3 instances of filebeat publishing to a kafka instance located on the same server via loopback
We're using filebeat 6.2.0
Since the cluster restart Filebeat's CPU usage is also consistently above 100, sometimes spiking to 400%, especially after running it for a longer period of time, changing scan frequency only marginally effects this.
Our registry file is about 3.5 MB and is increasing
The zeek(bro) logs we're ingesting rotate hourly, the suricata log does not rotate
The registry cleaning options I'm using now were not necessary before the cluster crash, but dont seem to make a difference in the registry file now, which I believe to be the culprit.
Looking for any kind of advice to re-stabilize the ingest rates on my servers please
My prospector config looks like this:
- type: log
enabled: true
paths: ["/data/bro/logs/current/*.log"]
ignore_older: 90m
clean_inactive: 120m
clean_removed: true
close_removed: true
- type: log
enabled: true
paths: ["/data/suricata/logs/eve.json"]
ignore_older: 90m
clean_inactive: 120m
clean_removed: true
close_removed: true