Filebeat skipping events after after restarting to harvest

Hello,
I am investigating missing events scenario. The setup is as follows
Linux box running filebeat/ logstash/ elasticsearch and a program which generates the logs. Its on a local drive (maybe we can rule out network drive issue)
The events are there in the log file and hence maybe we can rule out the log rotation variable, since all the events are present in the same file (around the middle of the file)
The filebeat has the following logs around the same time

    2018-03-16T06:46:30-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.open_files=4 filebeat.harvester.running=4 filebeat.harvester.started=4 filebeat.prospector.log.files.truncated=1 libbeat.logstash.call_count.PublishEvents=2 libbeat.logstash.publish.read_bytes=12 libbeat.logstash.publish.write_bytes=198152 libbeat.logstash.published_and_acked_events=2113 libbeat.publisher.published_events=2113 publish.events=2117 registrar.states.update=2117 registrar.writes=2
    2018-03-16T06:47:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:47:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:48:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:48:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:49:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:49:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:50:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:50:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:51:00-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.closed=1 filebeat.harvester.open_files=-1 filebeat.harvester.running=-1 publish.events=1 registrar.states.update=1 registrar.writes=1
    2018-03-16T06:51:10-07:00 INFO File is inactive: /scratch/apps/deployed_versions.log. Closing because close_inactive of 5m0s reached.
    2018-03-16T06:51:30-07:00 INFO Non-zero metrics in the last 30s: filebeat.harvester.closed=4 filebeat.harvester.open_files=-4 filebeat.harvester.running=-4 publish.events=4 registrar.states.update=4 registrar.writes=2
    2018-03-16T06:52:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:52:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:53:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:53:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:54:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:54:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:55:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:55:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:56:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:56:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:57:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:57:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:58:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:58:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:59:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T06:59:30-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T07:00:00-07:00 INFO No non-zero metrics in the last 30s
    2018-03-16T07:00:10-07:00 ERR Failed to publish events caused by: write tcp 127.0.0.1:42938->127.0.0.1:5044: write: connection reset by peer
    2018-03-16T07:00:10-07:00 INFO Error publishing events (retrying): write tcp 127.0.0.1:42938->127.0.0.1:5044: write: connection reset by peer

so when it starts to harvest it missed a few lines (took the first 8 lines but the next 9 lines were skipped.) I am not too sure what could have go off, but similar pattern is noticed with other files which are not written very frequently.

ps: not too sure how can I add the logs in a better way in this post.

Can you share your config file and the version of Filebeat you are using?

How did you check that you lost the log lines? In the above there is a sending error, so it could be that the events are still queued on the Filebeat side.

For the formatting I recommend to you 3 backticks before and after the log / config files you copy here.

Thank you @ruflin for taking time to read / reply, I formatted, as you suggested with backticks, and is much more readable now.
the filebeat config file is as below

filebeat.prospectors:
- input_type: log
  paths:
  - /scratch/apps/*.log

output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

the version is filebeat-5.6.3-linux-x86_64

with regards to the

Blockquote
In the above there is a sending error, so it could be that the events are still queued on the Filebeat side.

I posted the question after around 2 days of the logs (not sure if filebeat would keep trying over two days, if it does its good) I had manually checked for the missing entries because I expected there to be 17 entries (we have a group of 17 entries but when I saw only 8 I looked at the actual logs to verify the entries were indeed present in the logs but absent in when searched via kibana). I verified right now and the events are still missing

Filebeat will keep retrying until it succesfully sent the events.

As you send through LS I'm wondering what processing you do on the LS side and if the event could be lost there?

To verify that Filebeat ships all log lines you could enable file output and check if all events show up there.

One thing that could also happen that you don't see the events in Kibana because the timestamp is not in the range you are looking at. Perhaps use a count query in ES to verify how many events you have.

Thanks Ruflin. I will check if the events are lost at logstash, (though I doubt it they succeed if I re-parse the log file from filebeat) by enabling the file output too,
BTW the logstash part is GROK filters and JSON filters

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.