I am running a data pipeline through filebeats, running on kubernetes. One container in the kubernetes pod receives traffic and dumps it to disk, the other container is filebeat. All traffic comes in to the pod through ELB, that way we continue processing data if a kube-node disappears. This pod has an overflow drives using a volume mount in kubernetes -- one ELB is re-used (so the inodes are consistent) but everytime it moves kube-nodes it gets a new DeviceID. This is causing files to be reprocessed during normal pod updates.
What I'm proposing is that we change the IsSame
function (https://github.com/elastic/beats/blob/edde8912793f3055f08937fb88fe432e47a8baaa/filebeat/input/file/file_other.go#L29-L32) to optionally ignore the DeviceID? Something like:
// IsSame file checks if the files are identical
func (fs StateOS) IsSame(state StateOS, ignoreDevice bool) bool {
return fs.Inode == state.Inode && (ignoreDevice || (fs.Device == state.Device))
}
Thoughts?