Filebeat Inconsistent Log Shipping

After a cluster crash, my ingest rates have been inconsistent from a filebeat to kafka to logstash data flow. Restarting filebeat seems to return it to a normal rate of log ingest for about 40 minutes, then it proceeds to dip to a low of about 30k events per 30minutes and another high of 120k per 30 minutes, repeatedly, each cycle the low and high decrease until they are nonexistent, then slowly increase back up to the original high/low combination.
The previous rate averaged at about 400k events per 30 minutes

I have no indication that logstash or elasticsearch is bottlenecking this setup, logstash's persistent queue is relatively small, and emptied at a consistent rate. Kafka shows that logstash is easily able to keep up with the rate of data being passed into kafka.
Our setup currently has 6 logstash VMs pulling from 3 non-clustered kafka instances, with 3 instances of filebeat publishing to a kafka instance located on the same server via loopback

We're using filebeat 6.2.0

Since the cluster restart Filebeat's CPU usage is also consistently above 100, sometimes spiking to 400%, especially after running it for a longer period of time, changing scan frequency only marginally effects this.

Our registry file is about 3.5 MB and is increasing

The zeek(bro) logs we're ingesting rotate hourly, the suricata log does not rotate

The registry cleaning options I'm using now were not necessary before the cluster crash, but dont seem to make a difference in the registry file now, which I believe to be the culprit.

Looking for any kind of advice to re-stabilize the ingest rates on my servers please
My prospector config looks like this:

 - type: log
  enabled: true 
  paths: ["/data/bro/logs/current/*.log"]
  ignore_older: 90m
  clean_inactive: 120m
  clean_removed: true
  close_removed: true
 - type: log
  enabled: true
  paths: ["/data/suricata/logs/eve.json"]
  ignore_older: 90m
  clean_inactive: 120m
  clean_removed: true
  close_removed: true

Hey @Dominic_Evert,

how do you count the ingest rate? Do you watch the successful events counter that Filebeat reports every once in a while? This log should look like:

2020-05-06T11:00:31.949+0300	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":40,"time":{"ms":40}},"total":{"ticks":76,"time":{"ms":76},"value":76},"user":{"ticks":36,"time":{"ms":36}}},"info":{"ephemeral_id":"cf240ff2-a953-43dc-bda1-d2847ae81bac","uptime":{"ms":33076}},"memstats":{"gc_next":6769024,"memory_alloc":5003448,"memory_total":9567744,"rss":29220864},"runtime":{"goroutines":20}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":0},"reloads":1,"scans":1},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":0}}},"registrar":{"states":{"current":0}},"system":{"cpu":{"cores":16},"load":{"1":3.8438,"15":1.5049,"5":2.1064,"norm":{"1":0.2402,"15":0.0941,"5":0.1317}}}}}}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.