Why processors need inode/fileid in the event (why is file.StateOS under Private?)

I have filebeat, built with a custom regexprocessor, that processes events from input log files that have no date, only time, in the log messages.

The regexprocessor constructs the @timestamp using the time from the log file plus the date found from either the filename, when tracking previously rotated logs, or the agent host time, when tracking the live/not rotated log that has no date in the filename.

There is a problem forming the correct date at the midnight log rotation time where using the agent host time is incorrect, being one day in advance.

I need the regexprocessor, and all processors in general, to have access to the inode/fileid of the file in the event, but this is not exposed.

Here's an example to make it more clear.

• It is June 28 2002 23:00
• The live file is server.log
• There are previously rotated log files server.log.2020-06-27 and earlier.
• The filebeat is tracking server.log (call it inode 123)
• At midnight the application log rotation system renames server.log to server.log.2020-06-28 and creates a new file named server.log (call it inode 456).
• filebeat runs my regexprocessor and passes in an event for inode 123 and filename server.log. The processor forms @timestamp, from host time (date 2020-06-29) and log time (23:59:59)

This is one day off and I could correct the issue if the event included an inode field and my processor kept state information that mapped inode to date 2020-06-28.

I can't simply change the application log rotation formatting. This is a real world money maker where it would be low in the priority backlog for the app teams.

Now, I believe the libbeat has to track inode (linux) and fileid (windows) and can surface this info in the event.

Can we get the inode/fileid included in the event and if the answer is no, then can you explain why not and offer clues so I can make a pull request and fix it myself?

In the bargain I'll throw in a regexprocessor that is sorely needed in the standard processor set.

Update: Searching through beat source code I found some clues:
In ./filebeat/input/log/harvester.go the func (h *Harvester) onMessage
passes state in the Event field "Private"

err := forwarder.Send(beat.Event{
    Timestamp: timestamp,
    Fields:    fields,
    Meta:      meta,
    Private:   state,
})

and state is type file.State

    type State struct {
        Id          string            `json:"-"` // local unique id to make comparison more efficient
        Finished    bool              `json:"-"` // harvester state
        Fileinfo    os.FileInfo       `json:"-"` // the file info
        Source      string            `json:"source"`
        Offset      int64             `json:"offset"`
        Timestamp   time.Time         `json:"timestamp"`
        TTL         time.Duration     `json:"ttl"`
        Type        string            `json:"type"`
        Meta        map[string]string `json:"meta"`
        FileStateOS file.StateOS
    }

and FileStateOS ultimately is type :

type StateOS struct {
    Inode  uint64 `json:"inode,"`
    Device uint64 `json:"device,"`
}

So, I tried and this works, but it is fragile as event.Private is an interface{} subject to change.

import (
libbeatfile "github.com/elastic/beats/libbeat/common/file"
filebeatfile "github.com/elastic/beats/filebeat/input/file"
)
        var inode string

        fsif := event.Private
        if fs, ok := fsif.(libbeatfile.StateOS); ok {
                inode = fs.String()
                p.putValue(event, "log.inode", inode)
        } else if fs, ok := fsif.(filebeatfile.State); ok {
                inode = fs.FileStateOS.String()
                p.putValue(event, "log.inode", inode)
        } else {
                p.Logger.Debugf("processMessage() The event.Private type = %T ; fsif = %+v", fsif, fsif)
        }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.