If a container running filebeat is lost and we launch a new container, the registry file of the old container will be lost too and the new container wouldn't know from where the harvester should read the new files which will cause inconsistent/ambiguous data in elasticsearch. It will read the whole file again. So, how should I go with creating a consistent registry file?
Should I also mount /var/lib/filebeat/ to some persistent volume?
Right now, I am thinking in kubernetes(AWS EKS) context. I have a persistent volume(AWS EFS). I have created a pvc(persistent volume claim) for each application, lets say app1-pvc for app1. This pvc is mounted to /var/log/app1/ location both on application and filebeat container. Filebeat is reading input from the file /var/log/app1/api_info.log. So, if my filebeat container gets restarted, the registry file will be lost and the whole api_info will be read again.
Am I thinking in the right way? If yes, then what are the ways to avoid the scenario which I'm facing here? If no, how does it prevent the above scenario? Or where I'm thinking wrong?
You can mount the data directory to a persistent location to enable filebeat to resume reading the files. I have tested this on docker swarm, but you can extend this to k8 as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.