Filebeat can't keep up with the logs

I'm using Filebeat 7.4.2 and it seems to not get all events on my log file. The log file is updating at a high speed cause of huge messages being logged and Filebeat couldn't seem to keep up and send all messages to Logstash before rolling the log file. How should i handle this kind of situation on Filebeat?

Hello @grazia0912 :slight_smile:

At first I would recommend upgrading to a newer version of both filebeat and the Elastic stack, there is so many improvements, bugfixes and performance changes, that it alone could help you resolve some of your issues.

Something that is a bit harder to determine, is if its in fact filebeat that cannot keep up, or if its Logstash, or in the end it could also be Elasticsearch that hits the limit.

There is a few ways you can determine which:
A few quick steps first:

  • In the filebeat logfile, there is a specific entry that is logged every 30 seconds, starting with "non-zero metrics", this shows the local stats of the filebeat, how many events are queued for example, if the queue is high, it is most likely either Logstash or ES that has issues.

  • Check the filebeat logs for any errors, same with the logstash logs.

  • You can enable monitoring of beats, Logstash and Elasticsearch, so that you can view their stats and metrics in Kibana, this can be quite useful to easily determine where the issue is:
    Monitor Filebeat | Filebeat Reference [7.12] | Elastic
    Monitoring Logstash | Logstash Reference [7.12] | Elastic
    Monitor a cluster | Elasticsearch Guide [7.12] | Elastic

  • In the logstash output configuration of filebeat.yml, try adding a higher worker count and bulk_max_size, but only in smaller increments!
    Default worker count is the amount of CPU's available, so if you have 2 CPU's, try setting the workers to 3.
    For bulk_max_size, the default is 2048 events, you could always try setting it a bit higher, like 3000.
    If it helps but you still don't manage to catch up, try increasing just the batch_size a little bit more, a very high worker count is not necessary, but having it just a bit higher than the CPU count can help in some cases.
    Documentation ref: Configure the Logstash output | Filebeat Reference [7.4] | Elastic

Some more debugging steps if everything else yields no results:

  1. Comment out the logstash output, and set the output to a local file, that way if it still cannot keep up, we can confirm its filebeat. This might be hard to determine sometimes, if the server is already overloaded when it comes to I/O, you could maybe try to run something like iostat -x to confirm/deny that.

  2. Change the logstash to output to file, instead of Elasticsearch, see if filebeat is still having issues, if not, and logstash does not have any issues either, its ES that cannot keep up.

Try some of these things out first, and let's take it from there :slight_smile:

1 Like

thank you for your reply, i will check and perform on your suggestions. currently what i can only have access to is the Filebeat. other team is controlling the Logstash and they are using v7.9.1. this is the current logs from my filebeat.

2021-04-13T15:09:05.244+0800	INFO	[monitoring]	log/log.go:145	Non-zero metrics in the last 30s	{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":3460265,"time":{"ms":78}},"total":{"ticks":18269796,"time":{"ms":672},"value":18269796},"user":{"ticks":14809531,"time":{"ms":594}}},"handles":{"open":332},"info":{"ephemeral_id":"cd1bbde1-965a-44a1-9ed1-33164732b6f6","uptime":{"ms":330690163}},"memstats":{"gc_next":33253088,"memory_alloc":19271456,"memory_total":1325033298504},"runtime":{"goroutines":81}},"filebeat":{"events":{"added":2048,"done":2048},"harvester":{"open_files":11,"running":11}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":2048,"batches":2,"total":2048},"read":{"bytes":42},"write":{"bytes":347028}},"pipeline":{"clients":1,"events":{"active":4117,"published":2048,"total":2048},"queue":{"acked":2048}}},"registrar":{"states":{"current":20,"update":2048},"writes":{"success":2,"total":2}}}}}
2021-04-13T15:09:13.483+0800	DEBUG	[input]	input/input.go:152	Run input
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:191	Start next scan
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:421	Check file for harvesting: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:511	Update existing file for harvesting: D:\7703_MEI\tibrvlisten.log, offset: 765979
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:563	Harvester for file is still running: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:212	input states cleaned up. Before: 11, After: 11, Pending: 0
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:243	State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800	DEBUG	[input]	log/input.go:233	Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:14.977+0800	DEBUG	[publisher]	memqueue/ackloop.go:160	ackloop: receive ack [40530: 0, 2048]
2021-04-13T15:09:14.977+0800	DEBUG	[logstash]	logstash/async.go:159	2009 events out of 2009 events sent to logstash host log-app-fab.mkz-gp.icp.infineon.com:9513. Continue sending
2021-04-13T15:09:14.990+0800	DEBUG	[publisher]	memqueue/eventloop.go:535	broker ACK events: count=2048, start-seq=81633339, end-seq=81635386

2021-04-13T15:09:14.990+0800	DEBUG	[publisher]	memqueue/ackloop.go:128	ackloop: return ack to broker loop:2048
2021-04-13T15:09:14.990+0800	DEBUG	[publisher]	memqueue/ackloop.go:131	ackloop:  done send ack
2021-04-13T15:09:14.990+0800	DEBUG	[processors]	processing/processors.go:183	Publish event: {