Hi,
I am running filebeat in a Docker Container on Kubernetes Cluster for Processing Logs of our Application and send them to Logstash. Our Log Data is stored in PV, so I ran only one pod of filebeat that takes logs from that PV, process it and send them to Logstash. I am storing Filebeat Registry file in our PV so that in case of any restart, it should start from wherever it left.
Now the problem arises with Pod restarts. As the new pod may be deployed on any node in kubernetes cluster, the device field in Filebeat Registry file gets changed and it is not able to retrieve offset information for a log file with the new (device + inode) combination, so it creates new entry for that log file in registry file with the new (device + inode) combination.
Registry (data.json) file before pod restart :
{
"source": "my_service_.2020-06-16.27.log",
"offset": 158130,
"timestamp": "2020-07-24T18:43:28.716910538Z",
"ttl": -1,
"type": "log",
"meta": null,
"FileStateOS": {
"inode": 4187121761,
"device": 2097258
}
}
Registry (data.json) file after pod restart :
{
"source": "my_service_.2020-06-16.27.log",
"offset": 158130,
"timestamp": "2020-07-24T18:43:28.716910538Z",
"ttl": -1,
"type": "log",
"meta": null,
"FileStateOS": {
"inode": 4187121761,
"device": 2097258
}
},
{
"source": "my_service_.2020-06-16.27.log",
"offset": 210589,
"timestamp": "2020-07-24T18:53:59.49175077Z",
"ttl": -1,
"type": "log",
"meta": null,
"FileStateOS": {
"inode": 4187121761,
"device": 8388788
}
}
This duplicate entry in the Registry file causes it to reprocess that log file again.. leading to duplicate logs in Elasticsearch.
As our log data is very large.. we cannot afford to reprocess all the log files for every pod restart.
-
Is there any solution to this problem??
-
Is there any way in which device filed can be kept as same over pod restarts??
Please provide solution asap as this is a very crucial part in our deployment.