Can we get FileBeats to optionally ignore DeviceID in FileStateOS?

I am running a data pipeline through filebeats, running on kubernetes. One container in the kubernetes pod receives traffic and dumps it to disk, the other container is filebeat. All traffic comes in to the pod through ELB, that way we continue processing data if a kube-node disappears. This pod has an overflow drives using a volume mount in kubernetes -- one ELB is re-used (so the inodes are consistent) but everytime it moves kube-nodes it gets a new DeviceID. This is causing files to be reprocessed during normal pod updates.

What I'm proposing is that we change the IsSame function (https://github.com/elastic/beats/blob/edde8912793f3055f08937fb88fe432e47a8baaa/filebeat/input/file/file_other.go#L29-L32) to optionally ignore the DeviceID? Something like:

// IsSame file checks if the files are identical
func (fs StateOS) IsSame(state StateOS, ignoreDevice bool) bool {
return fs.Inode == state.Inode && (ignoreDevice || (fs.Device == state.Device))
}

Thoughts?

Filebeat must be able to detect file renames/moves. This requires the devide ID check. Problem with ignoring device ID is, if file rotation is used (which you might still use in this case, so you can delete old logs + have filebeat process 'older' logs on rotation) you will have duplicates as well. We're considering some 'fingerprinting' on files or blocks of files to detect renames without having to rely on the device ID.

Always interesting to see how new environment create new challenges :slight_smile: Is there something similar to the Device ID that stays the same?

This topic was automatically closed after 21 days. New replies are no longer allowed.