Hello,
I am investigating missing events scenario. The setup is as follows
Linux box running filebeat/ logstash/ elasticsearch and a program which generates the logs. Its on a local drive (maybe we can rule out network drive issue)
The events are there in the log file and hence maybe we can rule out the log rotation variable, since all the events are present in the same file (around the middle of the file)
The filebeat has the following logs around the same time
2018-03-16T06:46:30-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=4 filebeat.harvester.running=4 filebeat.harvester.started=4 filebeat.prospector.log.files.truncated=1 libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=12 libbeat.logstash.publish.write_bytes=198152 libbeat.logstash.published_and_acked_events=2113 libbeat.publisher.published_events=2113 publish.events=2117 registrar.states.update=2117 registrar.writes=2
2018-03-16T06:47:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:47:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:48:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:48:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:49:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:49:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:50:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:50:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:51:00-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.closed=1 filebeat.harvester.open_files=-1 filebeat.harvester.running=-1 publish.events=1 registrar.states.update=1 registrar.writes=1
2018-03-16T06:51:10-07:00 INFO File is inactive: /scratch/apps/deployed_versions.log. Closing because close_inactive of 5m0s reached.
2018-03-16T06:51:30-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.closed=4 filebeat.harvester.open_files=-4 filebeat.harvester.running=-4 publish.events=4 registrar.states.update=4 registrar.writes=2
2018-03-16T06:52:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:52:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:53:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:53:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:54:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:54:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:55:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:55:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:56:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:56:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:57:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:57:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:58:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:58:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:59:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T06:59:30-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T07:00:00-07:00 INFO No non-zero metrics in the last 30s
2018-03-16T07:00:10-07:00 ERR Failed to publish events caused by: write tcp 127.0.0.1:42938->127.0.0.1:5044: write: connection reset by peer
2018-03-16T07:00:10-07:00 INFO Error publishing events (retrying): write tcp 127.0.0.1:42938->127.0.0.1:5044: write: connection reset by peer
so when it starts to harvest it missed a few lines (took the first 8 lines but the next 9 lines were skipped.) I am not too sure what could have go off, but similar pattern is noticed with other files which are not written very frequently.
ps: not too sure how can I add the logs in a better way in this post.