filebeat uses send-at-least once semantics. Have you check the files state in the registry?
A file might not be picked up if it has been deleted via log-rotation and filebeat is restart/started thereafter, or filebeat has closed a file and can not pick it up anymore, due to ignore_older.
What do you mean by send-at-least once sementics ?
That is, on failure (e.g. missing ACK from LS), filebeat will retry -> harvesters will be blocked, due to buffers in filebeat being filled up.
It seems that filebeat did not see these files, no trace in log file nor in the registry.
Could it be due to broken communication with logstash ?
Maybe you want to share the full configuration, logs, registry file with us? Given the information I have so far, I'd assume it's due to ignore_older plus clean_inactive. The clean_inactive removes entries from the registry. Due to ignore_older, these old files are not picked up again...
The problem is I cannot easily test this scenario again.
It occurred on a production environment while our logstash server was down for patching activity.
It is a hard task to reproduce the environment in validation stage.
Why do you think that close_removed can change something ? Missing files have not been removed.
I think @steffens explanations are correct. Harvesters have been blocked for too long time and as ignore_older is quite short (10'), new files have been ignored.
Sorry for the really late reply this somehow slipped through the cracks. You are right in case the files are not removed close_removed would not have an affect. As you haven't set a harvester_limit I would expect the harvester still pick up the new files preventing from applying ignore_older. I wonder now if close_inactivecould apply, close the file and the clean_inactive can happen.
I would be really interesting to see the log file when this happens as this would allow to see the steps and logics that filebeat applied.
No problem ! Thank you for answering.
As I said, it is not an easy thing to simulate this behavior again. There is no patching activity scheduled on our logstash server (and I cannot stop it ! ) so just have to wait...
I have change the ignore_older config to 2h as I think this parameter comes into play in this issue. But maybe I'm wrong...
If the issue happens again, I will send you the logs !
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.