We've faced the following problem in our Elastic stack: Filebeat sends the processed logs - about 100-150 Gb in average per day, but instead of straight line as it always be now we see such peaks (see screenshot). We've checked both Elastic and Filebeat logs in the debug mode but found nothing. Also we've checked network - traffic is nice and smooth. Could you advise, please?
We use Elasticsearch/Filebeat 7.8.1. Thanks in advance.
Hi, @shaunak! Thank you for the reply!
I hope this chart looks better then the first. At the pointer normal behaviour line breaks into the peaks of re-sending logs by the Filebeat. We didn't do any changes to the Elastic cluster or our servers, it happens absolutely unexpected.
We've checked our servers that send the logs, we've checked both Elastic and Filebeat logs in debug mode and found nothing suspicious. We've tried to restart Elastic cluster. We've checked network traffic and it looks nice. We are have no more ideas what else to do. So we're here
This is interesting. How are you able to tell that Filebeat is trying to re-send logs? This would happen if Filebeat was having trouble talking to Elasticsearch but then you'd see errors about in the Filebeat log.
Right, that was my first theory as well (given the extremes — 0 for some time and then a peak, 0 and then a peak, etc.). But this theory can be easily tested by looking at Filebeat logs around the times of 0 activity — we should see errors about being able to send data to Elasticsearch, retrying, etc. You mentioned in your earlier comment that you found nothing suspicious in the logs though. So I'm not sure this is due to Filebeat retrying.
We didn't pay too much attention to the fact is there data in the Elastic or not. Let me explain: we're turn on the debug mode on the several servers that send the logs and left it for a couple of hours. Then we've filtered these logs for the key words as error, warning, disconnect and so on and found nothing. Also we've seen in the Filebeat debug log how it's send processed logs. But in the Elastic we still have nothing or these peaks.
Any chance you could shut down all but one of your Filebeat instances, then observe this chart for a bit? If it continues this zero-then-spike pattern, could you post the Filebeat logs (debug level would be nice but not required) here (appropriated redacted) from when the activity is 0?
Understood. In that case, is there any way you could filter that chart to only show traffic coming from one Filebeat instance? I'm thinking of different ways we could narrow down the scope of this problem so we might get some clues and/or make it easier to troubleshoot.
Oh, I didn't realize that was for traffic from a single Filebeat instance. In that case, you don't need to post a new one. If you could post some logs here (maybe a minute's worth) from that Filebeat instance (with sensitive data redacted) during the timestamps when its apparently not sending any logs over the network, that would be great. Thanks!