I'm using Filebeat 7.4.2 and it seems to not get all events on my log file. The log file is updating at a high speed cause of huge messages being logged and Filebeat couldn't seem to keep up and send all messages to Logstash before rolling the log file. How should i handle this kind of situation on Filebeat?
Hello @grazia0912
At first I would recommend upgrading to a newer version of both filebeat and the Elastic stack, there is so many improvements, bugfixes and performance changes, that it alone could help you resolve some of your issues.
Something that is a bit harder to determine, is if its in fact filebeat that cannot keep up, or if its Logstash, or in the end it could also be Elasticsearch that hits the limit.
There is a few ways you can determine which:
A few quick steps first:
-
In the filebeat logfile, there is a specific entry that is logged every 30 seconds, starting with "non-zero metrics", this shows the local stats of the filebeat, how many events are queued for example, if the queue is high, it is most likely either Logstash or ES that has issues.
-
Check the filebeat logs for any errors, same with the logstash logs.
-
You can enable monitoring of beats, Logstash and Elasticsearch, so that you can view their stats and metrics in Kibana, this can be quite useful to easily determine where the issue is:
Monitor Filebeat | Filebeat Reference [7.12] | Elastic
Monitoring Logstash | Logstash Reference [7.12] | Elastic
Monitor a cluster | Elasticsearch Guide [7.12] | Elastic -
In the logstash output configuration of filebeat.yml, try adding a higher worker count and bulk_max_size, but only in smaller increments!
Default worker count is the amount of CPU's available, so if you have 2 CPU's, try setting the workers to 3.
For bulk_max_size, the default is 2048 events, you could always try setting it a bit higher, like 3000.
If it helps but you still don't manage to catch up, try increasing just the batch_size a little bit more, a very high worker count is not necessary, but having it just a bit higher than the CPU count can help in some cases.
Documentation ref: Configure the Logstash output | Filebeat Reference [7.4] | Elastic
Some more debugging steps if everything else yields no results:
-
Comment out the logstash output, and set the output to a local file, that way if it still cannot keep up, we can confirm its filebeat. This might be hard to determine sometimes, if the server is already overloaded when it comes to I/O, you could maybe try to run something like
iostat -x
to confirm/deny that. -
Change the logstash to output to file, instead of Elasticsearch, see if filebeat is still having issues, if not, and logstash does not have any issues either, its ES that cannot keep up.
Try some of these things out first, and let's take it from there
thank you for your reply, i will check and perform on your suggestions. currently what i can only have access to is the Filebeat. other team is controlling the Logstash and they are using v7.9.1. this is the current logs from my filebeat.
2021-04-13T15:09:05.244+0800 INFO [monitoring] log/log.go:145 Non-zero metrics in the last 30s {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":3460265,"time":{"ms":78}},"total":{"ticks":18269796,"time":{"ms":672},"value":18269796},"user":{"ticks":14809531,"time":{"ms":594}}},"handles":{"open":332},"info":{"ephemeral_id":"cd1bbde1-965a-44a1-9ed1-33164732b6f6","uptime":{"ms":330690163}},"memstats":{"gc_next":33253088,"memory_alloc":19271456,"memory_total":1325033298504},"runtime":{"goroutines":81}},"filebeat":{"events":{"added":2048,"done":2048},"harvester":{"open_files":11,"running":11}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":2048,"batches":2,"total":2048},"read":{"bytes":42},"write":{"bytes":347028}},"pipeline":{"clients":1,"events":{"active":4117,"published":2048,"total":2048},"queue":{"acked":2048}}},"registrar":{"states":{"current":20,"update":2048},"writes":{"success":2,"total":2}}}}}
2021-04-13T15:09:13.483+0800 DEBUG [input] input/input.go:152 Run input
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:191 Start next scan
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:421 Check file for harvesting: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:511 Update existing file for harvesting: D:\7703_MEI\tibrvlisten.log, offset: 765979
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:563 Harvester for file is still running: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:212 input states cleaned up. Before: 11, After: 11, Pending: 0
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:243 State for file not removed because harvester not finished: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:13.483+0800 DEBUG [input] log/input.go:233 Remove state for file as file removed or renamed: D:\7703_MEI\tibrvlisten.log
2021-04-13T15:09:14.977+0800 DEBUG [publisher] memqueue/ackloop.go:160 ackloop: receive ack [40530: 0, 2048]
2021-04-13T15:09:14.977+0800 DEBUG [logstash] logstash/async.go:159 2009 events out of 2009 events sent to logstash host log-app-fab.mkz-gp.icp.infineon.com:9513. Continue sending
2021-04-13T15:09:14.990+0800 DEBUG [publisher] memqueue/eventloop.go:535 broker ACK events: count=2048, start-seq=81633339, end-seq=81635386
2021-04-13T15:09:14.990+0800 DEBUG [publisher] memqueue/ackloop.go:128 ackloop: return ack to broker loop:2048
2021-04-13T15:09:14.990+0800 DEBUG [publisher] memqueue/ackloop.go:131 ackloop: done send ack
2021-04-13T15:09:14.990+0800 DEBUG [processors] processing/processors.go:183 Publish event: {
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.