I briefly mentioned this problem to Carlos Pérez at the OSSummit this week and he suggested to post my problem here.
Since a couple of weeks we have a problem on a particular host (a VM) that
- Only parts of the logs are seen in Kibana
- The log entries which make it through are all sent in the same time, having all the same timestamp (looks like accumulated logs)
The log source is a Nginx log from a well-visited site. There are multiple log entries per second, yet in the graph you see the gaps.
This happens on the last 5.6(.12 if I remember correctly) as well as on the newest 6.x version. I also went back to 5.6.9 to see if there was a regression. Always with the same result.
Logstash (where the logs are sent to) and Elasticsearch are working fine, as you can see the other log entries are correctly entered and graphed (see dashboard screenshot above).
I enabled the debug logs in Filebeat and could only see that the logs were sent accumulated at (seemlingly) random times. Which would explain the gaps but not the missing log entries.
On another host which also sends nginx logs using Filebeat the data is coming in almost realtime as wanted.
The following graph shows how much data is missing now:
The graph shows the number of log entries (aggregated to 1 day) in the past 60 days. As you can see there is a clear drop of the number since September 11. And no, the reason is not less visitors. We still have the same kind of website visitors.
Can anyone help explain what could be the reason? Different settings to try and debug this issue?