We have Filebeat running as a DaemonSet in our Kubernetes cluster, using, for the most part, the yaml provided in the Filebeat documentation.
I am trying to decide whether we should be using the
tail_files true or false configuration option. Since our Filebeat input is log files which are rolling every 24 hours, and according to the documentation, it looks like we should be using the
tail_files: false option if we don't want to risk losing a few lines of the new log files when rolling happens (we have the
scan_frequency: 1s so don't know if that is a possibility).
However, under such a scenario, any Filebeat pod restart or upgrade will destroy the registry file as it resides in the pod itself and, consequently, re-send all the events from the files whose pattern matches the configuration producing duplicates in ElasticSearch. I believe such a scenario is only possible in Kubernetes as there is no "restart" of a pod.
tail_files: true is not an option either, I was thinking about persisting the Filebeat registry file as a volume in the host VM so that any Filebeat pod restarts, upgrades, crashes (be they manually triggered or by Kubernetes itself) will pick up from the correct offset index. Kind of like a hybrid between the two
Would this be a good approach to our problem?
In theory, using a DaemonSet without a
rollingUpdate strategy there should not be more than one Filebeat using registry at a time.
Still, is there a scenario under which we could get a registry corruption or have more than one entity updating it?