I have filebeat, built with a custom regexprocessor, that processes events from input log files that have no date, only time, in the log messages.
The regexprocessor constructs the @timestamp using the time from the log file plus the date found from either the filename, when tracking previously rotated logs, or the agent host time, when tracking the live/not rotated log that has no date in the filename.
There is a problem forming the correct date at the midnight log rotation time where using the agent host time is incorrect, being one day in advance.
I need the regexprocessor, and all processors in general, to have access to the inode/fileid of the file in the event, but this is not exposed.
Here's an example to make it more clear.
• It is June 28 2002 23:00
• The live file is server.log
• There are previously rotated log files server.log.2020-06-27 and earlier.
• The filebeat is tracking server.log (call it inode 123)
• At midnight the application log rotation system renames server.log to server.log.2020-06-28 and creates a new file named server.log (call it inode 456).
• filebeat runs my regexprocessor and passes in an event for inode 123 and filename server.log. The processor forms @timestamp, from host time (date 2020-06-29) and log time (23:59:59)
This is one day off and I could correct the issue if the event included an inode field and my processor kept state information that mapped inode to date 2020-06-28.
I can't simply change the application log rotation formatting. This is a real world money maker where it would be low in the priority backlog for the app teams.
Now, I believe the libbeat has to track inode (linux) and fileid (windows) and can surface this info in the event.
Can we get the inode/fileid included in the event and if the answer is no, then can you explain why not and offer clues so I can make a pull request and fix it myself?
In the bargain I'll throw in a regexprocessor that is sorely needed in the standard processor set.
Update: Searching through beat source code I found some clues:
In ./filebeat/input/log/harvester.go the func (h *Harvester) onMessage
passes state in the Event field "Private"
err := forwarder.Send(beat.Event{
Timestamp: timestamp,
Fields: fields,
Meta: meta,
Private: state,
})
and state is type file.State
type State struct {
Id string `json:"-"` // local unique id to make comparison more efficient
Finished bool `json:"-"` // harvester state
Fileinfo os.FileInfo `json:"-"` // the file info
Source string `json:"source"`
Offset int64 `json:"offset"`
Timestamp time.Time `json:"timestamp"`
TTL time.Duration `json:"ttl"`
Type string `json:"type"`
Meta map[string]string `json:"meta"`
FileStateOS file.StateOS
}
and FileStateOS ultimately is type :
type StateOS struct {
Inode uint64 `json:"inode,"`
Device uint64 `json:"device,"`
}
So, I tried and this works, but it is fragile as event.Private is an interface{} subject to change.
import (
libbeatfile "github.com/elastic/beats/libbeat/common/file"
filebeatfile "github.com/elastic/beats/filebeat/input/file"
)
var inode string
fsif := event.Private
if fs, ok := fsif.(libbeatfile.StateOS); ok {
inode = fs.String()
p.putValue(event, "log.inode", inode)
} else if fs, ok := fsif.(filebeatfile.State); ok {
inode = fs.FileStateOS.String()
p.putValue(event, "log.inode", inode)
} else {
p.Logger.Debugf("processMessage() The event.Private type = %T ; fsif = %+v", fsif, fsif)
}