I am using Elastic Cloud and "Nginx Ingress Controler Logs" integration to bring back my nginx access-logs to kibana.
As mentionned in the tutorial, I created an Agent Policy and since im using a Kubernetes cluster I created a DaemonSet same as the documentation. I have 4 nodes so 4 Elastic Agents enrolls.
All works fine, I created an ingest pipeline in order to add some custom fields depending on the url.
After few days, I realized that I have duplicated lines and after some investigation I figured out that some of my access-log lines are duplicated 8 times.
I think I understand what's happening but Im not able to know how to solve the problem
Out of my 4 nodes, only one send my logs to elastic, it seems logic because my ingress-controller is on this node. The problem is that the Elastic Agent linked to this node is dying quite often for an unknown reason. Each time a new Elastic Agent is started on this node, it take all the access-log content and bring it to elastic creating duplicated lines.
Finally the questions:
-
Is it possible to ask to the "Nginx Ingress Controler logs" integration (which is using filestream input) to only harvest files and wait for change but not ingest data already in the file at harvesting time. In other words, set the pointer to the end of the file at startup. I couldn't find any option in the documentation
-
I tried to set the _id field to http.request.id with is unique for a request hoping that when the _id already exist it would fail but elastic just generate a random _id instead. Is it possible to fail in that case ?
I have no more idea to solve my issue, there is probably an easy way to deal with it.