Since the size of our log files is huge, I choose not to store the original log lines, but instead store the filepath and the offset of the log line so as to retrieve the log lines later.
It says here that "(The exported field offset) is the file offset the reported line starts at." However, when I try to retrieve the log line by providing offset, I am actually getting the next log line. It seems that offset is pointing at the end of the log line instead of the beginning.
Do I need to read the log line backwards from offset? Is there a better solution?
As workaround: The string (in JSON) should be utf-8 encoded. Using offset - byte size of string - 1 (for newline character) should give you the lines start offset. In case you do some event processing in logstash/Elastic ingest pipeline, you can adjust the offset.
// Get copy of state to work on
// This is important in case sending is not successful so on shutdown
// the old offset is reported
state := h.getState()
state.Offset += int64(message.Bytes)
// Create state event
data := util.NewData()
if h.source.HasState() {
data.SetState(state)
}
text := string(message.Content)
// Check if data should be added to event. Only export non empty events.
if !message.IsEmpty() && h.shouldExportLine(text) {
data.Event = common.MapStr{
"@timestamp": common.Time(message.Ts),
"source": state.Source,
"offset": state.Offset, // Offset here is the offset before the starting char.
}
data.Event.DeepUpdate(message.Fields)
// Check if json fields exist
var jsonFields common.MapStr
if fields, ok := data.Event["json"]; ok {
jsonFields = fields.(common.MapStr)
}
if h.config.JSON != nil && len(jsonFields) > 0 {
reader.MergeJSONFields(data.Event, jsonFields, &text, *h.config.JSON)
} else if &text != nil {
if data.Event == nil {
data.Event = common.MapStr{}
}
data.Event["message"] = text
}
}
Line 243 first adds the current message length to state.offset:
state.Offset += int64(message.Bytes)
Then line 259 sets the exported field "offset" to state.offset:
"offset": state.Offset, // Offset here is the offset before the starting char.
If we change the order of these two sections of code, would it solve the issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.